Abstract:
We present an effective heuristic for the Steiner Problem in Graphs. Its main elements are a multistart algorithm coupled with aggressive combination of elite solutions, both leveraging recently-proposed fast local searches. We also propose a fast implementation of a well-known dual ascent algorithm that not only makes our heuristics more robust (by quickly dealing with easier cases), but can also be used as a building block of an exact (branch-and-bound) algorithm that is quite effective for some inputs. On all graph classes we consider, our heuristic is competitive with (and sometimes more effective than) any previous approach with similar running times. It is also scalable: with long runs, we could improve or match the best published results for most open instances in the literature.

Abstract:
Closeness centrality, first considered by Bavelas (1948), is an importance measure of a node in a network which is based on the distances from the node to all other nodes. The classic definition, proposed by Bavelas (1950), Beauchamp (1965), and Sabidussi (1966), is (the inverse of) the average distance to all other nodes. We propose the first highly scalable (near linear-time processing and linear space overhead) algorithm for estimating, within a small relative error, the classic closeness centralities of all nodes in the graph. Our algorithm applies to undirected graphs, as well as for centrality computed with respect to round-trip distances in directed graphs. For directed graphs, we also propose an efficient algorithm that approximates generalizations of classic closeness centrality to outbound and inbound centralities. Although it does not provide worst-case theoretical approximation guarantees, it is designed to perform well on real networks. We perform extensive experiments on large networks, demonstrating high scalability and accuracy.

Abstract:
We consider a model for diffusion in a network that captures both the scope of infection and its propagation time: The edges of the network have associated lengths which model transmission times, and influence scores are higher for faster propagation. We propose an intuitive measure of {\it timed influence}, which extends and unifies several classic measures, including the well-studied "binary" influence [Richardson and Domingos 2002; Kempe et al. 2003] (which only measures scope), a recently-studied {\it threshold} model of timed influence [Gomez-Rodriguez et al. 2011] (which considers a node influenced only within a fixed time horizon), and {\it closeness centrality} (which is extended from being defined for a single node to multiple seed nodes and from a fixed network to distributions). Finally, we provide the first highly scalable algorithms for timed influence computation and maximization. In particular, we improve by orders of magnitude the scalability of state-of-the-art threshold timed influence computation. Moreover, our design provides robust guarantees and is novel also as a theoretical contribution.

Abstract:
Propagation of contagion through networks is a fundamental process. It is used to model the spread of information, influence, or a viral infection. Diffusion patterns can be specified by a probabilistic model, such as Independent Cascade (IC), or captured by a set of representative traces. Basic computational problems in the study of diffusion are influence queries (determining the potency of a specified seed set of nodes) and Influence Maximization (identifying the most influential seed set of a given size). Answering each influence query involves many edge traversals, and does not scale when there are many queries on very large graphs. The gold standard for Influence Maximization is the greedy algorithm, which iteratively adds to the seed set a node maximizing the marginal gain in influence. Greedy has a guaranteed approximation ratio of at least (1-1/e) and actually produces a sequence of nodes, with each prefix having approximation guarantee with respect to the same-size optimum. Since Greedy does not scale well beyond a few million edges, for larger inputs one must currently use either heuristics or alternative algorithms designed for a pre-specified small seed set size. We develop a novel sketch-based design for influence computation. Our greedy Sketch-based Influence Maximization (SKIM) algorithm scales to graphs with billions of edges, with one to two orders of magnitude speedup over the best greedy methods. It still has a guaranteed approximation ratio, and in practice its quality nearly matches that of exact greedy. We also present influence oracles, which use linear-time preprocessing to generate a small sketch for each node, allowing the influence of any seed set to be quickly answered from the sketches of its nodes.

Abstract:
We study the journey planning problem in public transit networks. Developing efficient preprocessing-based speedup techniques for this problem has been challenging: current approaches either require massive preprocessing effort or provide limited speedups. Leveraging recent advances in Hub Labeling, the fastest algorithm for road networks, we revisit the well-known time-expanded model for public transit. Exploiting domain-specific properties, we provide simple and efficient algorithms for the earliest arrival, profile, and multicriteria problems, with queries that are orders of magnitude faster than the state of the art.

Abstract:
In the paper we compare the modelling ability of discrete-time multivariate Stochastic Volatility models to describe the conditional correlations between stock index returns. We consider four trivariate SV models, which differ in the structure of the conditional covariance matrix. Specifications with zero, constant and time-varying conditional correlations are taken into account. As an example we study trivariate volatility models for the daily log returns on the WIG, SP500, and FTSE100 indexes. In order to formally compare the relative explanatory power of SV specifications we use the Bayesian principles of comparing statistic models. Our results are based on the Bayes factors and implemented through Markov Chain Monte Carlo techniques. The results indicate that the most adequate specifications are those that allow for time-varying conditional correlations and that have as many latent processes as there are conditional variances and covariances. The empirical results clearly show that the data strongly reject the assumption of constant conditional correlations.

Abstract:
We survey recent advances in algorithms for route planning in transportation networks. For road networks, we show that one can compute driving directions in milliseconds or less even at continental scale. A variety of techniques provide different trade-offs between preprocessing effort, space requirements, and query time. Some algorithms can answer queries in a fraction of a microsecond, while others can deal efficiently with real-time traffic. Journey planning on public transportation systems, although conceptually similar, is a significantly harder problem due to its inherent time-dependent and multicriteria nature. Although exact algorithms are fast enough for interactive queries on metropolitan transit systems, dealing with continent-sized instances requires simplifications or heavy preprocessing. The multimodal route planning problem, which seeks journeys combining schedule-based transportation (buses, trains) with unrestricted modes (walking, driving), is even harder, relying on approximate solutions even for metropolitan inputs.

Abstract:
We consider $n\times n$ real symmetric and hermitian random matrices $H_{n,m}$ equals the sum of a non-random matrix $H_{n}^{(0)}$ matrix and the sum of $m$ rank-one matrices determined by $m$ i.i.d. isotropic random vectors with log-concave probability law and i.i.d. random amplitudes $\{\tau_{\alpha }\}_{\alpha =1}^{m}$. This is a generalization of the case of vectors uniformly distributed over the unit sphere, studied in [Marchenko-Pastur (1967)]. We prove that if $n\to \infty, m\to \infty, m/n\to c\in \lbrack 0,\infty)$ and that the empirical eigenvalue measure of $H_{n}^{(0)}$ converges weakly, then the empirical eigenvalue measure of $H_{n,m}$ converges in probability to a non-random limit, found in [Marchenko-Pastur (1967)].

Abstract:
It is proved that if $u_1,\ldots, u_n$ are vectors in ${\Bbb R}^k, k\le n, 1 \le p < \infty$ and $$r = ({1\over k} \sum ^n_1 |u_i|^p)^{1\over p}$$ then the volume of the symmetric convex body whose boundary functionals are $\pm u_1,\ldots, \pm u_n$, is bounded from below as $$|\{ x\in {\Bbb R}^k\colon \ |\langle x,u_i\rangle | \le 1 \ \hbox{for every} \ i\}|^{1\over k} \ge {1\over \sqrt{\rho}r}.$$ An application to number theory is stated.

Abstract:
We give an alternative proof of a recent result of Klartag on the existence of almost subgaussian linear functionals on convex bodies. If $K$ is a convex body in ${\mathbb R}^n$ with volume one and center of mass at the origin, there exists $x\neq 0$ such that $$|\{y\in K: |< y,x> |\gr t\|<\cdot, x>\|_1\}|\ls\exp (-ct^2/\log^2(t+1))$$ for all $t\gr 1$, where $c>0$ is an absolute constant. The proof is based on the study of the $L_q$--centroid bodies of $K$. Analogous results hold true for general log-concave measures.