Abstract:
We develop a family of accelerated stochastic algorithms that minimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear least-squares regression, across a wide range of problem settings. To achieve this, we establish a framework based on the classical proximal point algorithm. Namely, we provide several algorithms that reduce the minimization of a strongly convex function to approximate minimizations of regularizations of the function. Using these results, we accelerate recent fast stochastic algorithms in a black-box fashion. Empirically, we demonstrate that the resulting algorithms exhibit notions of stability that are advantageous in practice. Both in theory and in practice, the provided algorithms reap the computational benefits of adding a large strongly convex regularization term, without incurring a corresponding bias to the original problem.

Abstract:
Motivated by the imminent growth of massive, highly redundant genomic databases, we study the problem of compressing a string database while simultaneously supporting fast random access, substring extraction and pattern matching to the underlying string(s). Bille et al. (2011) recently showed how, given a straight-line program with $r$ rules for a string $s$ of length $n$, we can build an $\Oh{r}$-word data structure that allows us to extract any substring of length $m$ in $\Oh{\log n + m}$ time. They also showed how, given a pattern $p$ of length $m$ and an edit distance (k \leq m), their data structure supports finding all \occ approximate matches to $p$ in $s$ in $\Oh{r (\min (m k, k^4 + m) + \log n) + \occ}$ time. Rytter (2003) and Charikar et al. (2005) showed that $r$ is always at least the number $z$ of phrases in the LZ77 parse of $s$, and gave algorithms for building straight-line programs with $\Oh{z \log n}$ rules. In this paper we give a simple $\Oh{z \log n}$-word data structure that takes the same time for substring extraction but only $\Oh{z \min (m k, k^4 + m) + \occ}$ time for approximate pattern matching.

Abstract:
A transversal of a hypergraph is a set of vertices intersecting each hyperedge. We design and analyze new exponential-time algorithms to enumerate all inclusion-minimal transversals of a hypergraph. For each fixed k>2, our algorithms for hypergraphs of rank k, where the rank is the maximum size of a hyperedge, outperform the previous best. This also implies improved upper bounds on the maximum number of minimal transversals in n-vertex hypergraphs of rank k>2. Our main algorithm is a branching algorithm whose running time is analyzed with Measure and Conquer. It enumerates all minimal transversals of hypergraphs of rank 3 on n vertices in time O(1.6755^n). Our algorithm for hypergraphs of rank 4 is based on iterative compression. Our enumeration algorithms improve upon the best known algorithms for counting minimum transversals in hypergraphs of rank k for k>2 and for computing a minimum transversal in hypergraphs of rank k for k>5.

Abstract:
Since being analyzed by Rokhlin, Szlam, and Tygert and popularized by Halko, Martinsson, and Tropp, randomized Simultaneous Power Iteration has become the method of choice for approximate singular value decomposition. It is more accurate than simpler sketching algorithms, yet still converges quickly for any matrix, independently of singular value gaps. After $\tilde{O}(1/\epsilon)$ iterations, it gives a low-rank approximation within $(1+\epsilon)$ of optimal for spectral norm error. We give the first provable runtime improvement on Simultaneous Iteration: a simple randomized block Krylov method, closely related to the classic Block Lanczos algorithm, gives the same guarantees in just $\tilde{O}(1/\sqrt{\epsilon})$ iterations and performs substantially better experimentally. Despite their long history, our analysis is the first of a Krylov subspace method that does not depend on singular value gaps, which are unreliable in practice. Furthermore, while it is a simple accuracy benchmark, even $(1+\epsilon)$ error for spectral norm low-rank approximation does not imply that an algorithm returns high quality principal components, a major issue for data applications. We address this problem for the first time by showing that both Block Krylov Iteration and a minor modification of Simultaneous Iteration give nearly optimal PCA for any matrix. This result further justifies their strength over non-iterative sketching methods. Finally, we give insight beyond the worst case, justifying why both algorithms can run much faster in practice than predicted. We clarify how simple techniques can take advantage of common matrix properties to significantly improve runtime.

Abstract:
We present two algorithms for maintaining the topological order of a directed acyclic graph with n vertices, under an online edge insertion sequence of m edges. Efficient algorithms for online topological ordering have many applications, including online cycle detection, which is to discover the first edge that introduces a cycle under an arbitrary sequence of edge insertions in a directed graph. In this paper we present efficient algorithms for the online topological ordering problem. We first present a simple algorithm with running time O(n^{5/2}) for the online topological ordering problem. This is the current fastest algorithm for this problem on dense graphs, i.e., when m > n^{5/3}. We then present an algorithm with running time O((m + nlog n)\sqrt{m}); this is more efficient for sparse graphs. Our results yield an improved upper bound of O(min(n^{5/2}, (m + nlog n)sqrt{m})) for the online topological ordering problem.

Abstract:
Let {\alpha} be the maximal value such that the product of an n x n^{\alpha} matrix by an n^{\alpha} x n matrix can be computed with n^{2+o(1)} arithmetic operations. In this paper we show that \alpha>0.30298, which improves the previous record \alpha>0.29462 by Coppersmith (Journal of Complexity, 1997). More generally, we construct a new algorithm for multiplying an n x n^k matrix by an n^k x n matrix, for any value k\neq 1. The complexity of this algorithm is better than all known algorithms for rectangular matrix multiplication. In the case of square matrix multiplication (i.e., for k=1), we recover exactly the complexity of the algorithm by Coppersmith and Winograd (Journal of Symbolic Computation, 1990). These new upper bounds can be used to improve the time complexity of several known algorithms that rely on rectangular matrix multiplication. For example, we directly obtain a O(n^{2.5302})-time algorithm for the all-pairs shortest paths problem over directed graphs with small integer weights, improving over the O(n^{2.575})-time algorithm by Zwick (JACM 2002), and also improve the time complexity of sparse square matrix multiplication.

Abstract:
Given an undirected graph $G=(V,E)$ on $n$ vertices, $m$ edges, and an integer $t\ge 1$, a subgraph $(V,E_S)$, $E_S\subseteq E$ is called a $t$-spanner if for any pair of vertices $u,v \in V$, the distance between them in the subgraph is at most $t$ times the actual distance. We present streaming algorithms for computing a $t$-spanner of essentially optimal size-stretch trade offs for any undirected graph. Our first algorithm is for the classical streaming model and works for unweighted graphs only. The algorithm performs a single pass on the stream of edges and requires $O(m)$ time to process the entire stream of edges. This drastically improves the previous best single pass streaming algorithm for computing a $t$-spanner which requires $\theta(mn^{\frac{2}{t}})$ time to process the stream and computes spanner with size slightly larger than the optimal. Our second algorithm is for {\em StreamSort} model introduced by Aggarwal et al. [FOCS 2004], which is the streaming model augmented with a sorting primitive. The {\em StreamSort} model has been shown to be a more powerful and still very realistic model than the streaming model for massive data sets applications. Our algorithm, which works of weighted graphs as well, performs $O(t)$ passes using $O(\log n)$ bits of working memory only. Our both the algorithms require elementary data structures.

Abstract:
In this paper, we present a fast algorithm for constructing a concept (Galois) lattice of a binary relation, including computing all concepts and their lattice order. We also present two efficient variants of the algorithm, one for computing all concepts only, and one for constructing a frequent closed itemset lattice. The running time of our algorithms depends on the lattice structure and is faster than all other existing algorithms for these problems.

Abstract:
We study the problem of releasing $k$-way marginals of a database $D \in (\{0,1\}^d)^n$, while preserving differential privacy. The answer to a $k$-way marginal query is the fraction of $D$'s records $x \in \{0,1\}^d$ with a given value in each of a given set of up to $k$ columns. Marginal queries enable a rich class of statistical analyses of a dataset, and designing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf. Barak et. al., PODS '07). We give an algorithm that runs in time $d^{O(\sqrt{k})}$ and releases a private summary capable of answering any $k$-way marginal query with at most $\pm .01$ error on every query as long as $n \geq d^{O(\sqrt{k})}$. To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with non-trivial worst-case accuracy guarantees in time substantially smaller than the number of $k$-way marginal queries, which is $d^{\Theta(k)}$ (for $k \ll d$).

Abstract:
We study the classical approximate string matching problem, that is, given strings $P$ and $Q$ and an error threshold $k$, find all ending positions of substrings of $Q$ whose edit distance to $P$ is at most $k$. Let $P$ and $Q$ have lengths $m$ and $n$, respectively. On a standard unit-cost word RAM with word size $w \geq \log n$ we present an algorithm using time $$ O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n) $$ When $P$ is short, namely, $m = 2^{o(\sqrt{\log n})}$ or $m = 2^{o(\sqrt{w/\log w})}$ this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.