Abstract:
We present error-correcting codes that achieve the information-theoretically best possible trade-off between the rate and error-correction radius. Specifically, for every $0 < R < 1$ and $\eps> 0$, we present an explicit construction of error-correcting codes of rate $R$ that can be list decoded in polynomial time up to a fraction $(1-R-\eps)$ of {\em worst-case} errors. At least theoretically, this meets one of the central challenges in algorithmic coding theory. Our codes are simple to describe: they are {\em folded Reed-Solomon codes}, which are in fact {\em exactly} Reed-Solomon (RS) codes, but viewed as a code over a larger alphabet by careful bundling of codeword symbols. Given the ubiquity of RS codes, this is an appealing feature of our result, and in fact our methods directly yield better decoding algorithms for RS codes when errors occur in {\em phased bursts}. The alphabet size of these folded RS codes is polynomial in the block length. We are able to reduce this to a constant (depending on $\eps$) using ideas concerning ``list recovery'' and expander-based codes from \cite{GI-focs01,GI-ieeejl}. Concatenating the folded RS codes with suitable inner codes also gives us polynomial time constructible binary codes that can be efficiently list decoded up to the Zyablov bound, i.e., up to twice the radius achieved by the standard GMD decoding of concatenated codes.

Abstract:
Motivated by applications in storage systems and property testing, we study data stream algorithms for local testing and tolerant testing of codes. Ideally, we would like to know whether there exist asymptotically good codes that can be local/tolerant tested with one-pass, poly-log space data stream algorithms. We show that for the error detection problem (and hence, the local testing problem), there exists a one-pass, log-space data stream algorithm for a broad class of asymptotically good codes, including the Reed-Solomon (RS) code and expander codes. In our technically more involved result, we give a one-pass, $O(e\log^2{n})$-space algorithm for RS (and related) codes with dimension $k$ and block length $n$ that can distinguish between the cases when the Hamming distance between the received word and the code is at most $e$ and at least $a\cdot e$ for some absolute constant $a>1$. For RS codes with random errors, we can obtain $e\le O(n/k)$. For folded RS codes, we obtain similar results for worst-case errors as long as $e\le (n/k)^{1-\eps}$ for any constant $\eps>0$. These results follow by reducing the tolerant testing problem to the error detection problem using results from group testing and the list decodability of the code. We also show that using our techniques, the space requirement and the upper bound of $e\le O(n/k)$ cannot be improved by more than logarithmic factors.

Abstract:
In this work, we introduce a framework to study the effect of random operations on the combinatorial list-decodability of a code. The operations we consider correspond to row and column operations on the matrix obtained from the code by stacking the codewords together as columns. This captures many natural transformations on codes, such as puncturing, folding, and taking subcodes; we show that many such operations can improve the list-decoding properties of a code. There are two main points to this. First, our goal is to advance our (combinatorial) understanding of list-decodability, by understanding what structure (or lack thereof) is necessary to obtain it. Second, we use our more general results to obtain a few interesting corollaries for list decoding: (1) We show the existence of binary codes that are combinatorially list-decodable from $1/2-\epsilon$ fraction of errors with optimal rate $\Omega(\epsilon^2)$ that can be encoded in linear time. (2) We show that any code with $\Omega(1)$ relative distance, when randomly folded, is combinatorially list-decodable $1-\epsilon$ fraction of errors with high probability. This formalizes the intuition for why the folding operation has been successful in obtaining codes with optimal list decoding parameters; previously, all arguments used algebraic methods and worked only with specific codes. (3) We show that any code which is list-decodable with suboptimal list sizes has many subcodes which have near-optimal list sizes, while retaining the error correcting capabilities of the original code. This generalizes recent results where subspace evasive sets have been used to reduce list sizes of codes that achieve list decoding capacity.

Abstract:
We prove the following results concerning the list decoding of error-correcting codes: (i) We show that for \textit{any} code with a relative distance of $\delta$ (over a large enough alphabet), the following result holds for \textit{random errors}: With high probability, for a $\rho\le \delta -\eps$ fraction of random errors (for any $\eps>0$), the received word will have only the transmitted codeword in a Hamming ball of radius $\rho$ around it. Thus, for random errors, one can correct twice the number of errors uniquely correctable from worst-case errors for any code. A variant of our result also gives a simple algorithm to decode Reed-Solomon codes from random errors that, to the best of our knowledge, runs faster than known algorithms for certain ranges of parameters. (ii) We show that concatenated codes can achieve the list decoding capacity for erasures. A similar result for worst-case errors was proven by Guruswami and Rudra (SODA 08), although their result does not directly imply our result. Our results show that a subset of the random ensemble of codes considered by Guruswami and Rudra also achieve the list decoding capacity for erasures. Our proofs employ simple counting and probabilistic arguments.

Abstract:
The Lead-Based Multiple Video Transmission (LMVT) problem is motivated by applications in managing the quality of experience (QoE) of video streaming for mobile clients. In an earlier work, the LMVT problem has been shown to be NP-hard for a specific bit-to-lead conversion function $\phi$. In this work, we show the problem to be NP-hard even if the function $\phi$ is linear. We then design a fully polynomial time approximation scheme (FPTAS) for the problem. This problem is exactly equivalent to the Santa Clause Problem on which there has been a lot of work done off-late.

Abstract:
We continue the study of communication cost of computing functions when inputs are distributed among $k$ processors, each of which is located at one vertex of a network/graph called a terminal. Every other node of the network also has a processor, with no input. The communication is point-to-point and the cost is the total number of bits exchanged by the protocol, in the worst case, on all edges. Chattopadhyay, Radhakrishnan and Rudra (FOCS'14) recently initiated a study of the effect of topology of the network on the total communication cost using tools from $L_1$ embeddings. Their techniques provided tight bounds for simple functions like Element-Distinctness (ED), which depend on the 1-median of the graph. This work addresses two other kinds of natural functions. We show that for a large class of natural functions like Set-Disjointness the communication cost is essentially $n$ times the cost of the optimal Steiner tree connecting the terminals. Further, we show for natural composed functions like $\text{ED} \circ \text{XOR}$ and $\text{XOR} \circ \text{ED}$, the naive protocols suggested by their definition is optimal for general networks. Interestingly, the bounds for these functions depend on more involved topological parameters that are a combination of Steiner tree and 1-median costs. To obtain our results, we use some new tools in addition to ones used in Chattopadhyay et. al. These include (i) viewing the communication constraints via a linear program; (ii) using tools from the theory of tree embeddings to prove topology sensitive direct sum results that handle the case of composed functions and (iii) representing the communication constraints of certain problems as a family of collection of multiway cuts, where each multiway cut simulates the hardness of computing the function on the star topology.

Abstract:
We show that any q-ary code with sufficiently good distance can be randomly punctured to obtain, with high probability, a code that is list decodable up to radius $1 - 1/q - \epsilon$ with near-optimal rate and list sizes. Our results imply that "most" Reed-Solomon codes are list decodable beyond the Johnson bound, settling the long-standing open question of whether any Reed Solomon codes meet this criterion. More precisely, we show that a Reed-Solomon code with random evaluation points is, with high probability, list decodable up to radius $1 - \epsilon$ with list sizes $O(1/\epsilon)$ and rate $\Omega(\epsilon)$. As a second corollary of our argument, we obtain improved bounds on the list decodability of random linear codes over large fields. Our approach exploits techniques from high dimensional probability. Previous work used similar tools to obtain bounds on the list decodability of random linear codes, but the bounds did not scale with the size of the alphabet. In this paper, we use a chaining argument to deal with large alphabet sizes.

Abstract:
Evaluating the relational join is one of the central algorithmic and most well-studied problems in database systems. A staggering number of variants have been considered including Block-Nested loop join, Hash-Join, Grace, Sort-merge for discussions of more modern issues). Commercial database engines use finely tuned join heuristics that take into account a wide variety of factors including the selectivity of various predicates, memory, IO, etc. In spite of this study of join queries, the textbook description of join processing is suboptimal. This survey describes recent results on join algorithms that have provable worst-case optimality runtime guarantees. We survey recent work and provide a simpler and unified description of these algorithms that we hope is useful for theory-minded readers, algorithm designers, and systems implementors.

Abstract:
We show that every almost universal hash function also has the storage enforcement property. Almost universal hash functions have found numerous applications and we show that this new storage enforcement property allows the application of almost universal hash functions in a wide range of remote verification tasks: (i) Proof of Secure Erasure (where we want to remotely erase and securely update the code of a compromised machine with memory-bounded adversary), (ii) Proof of Ownership (where a storage server wants to check if a client has the data it claims to have before giving access to deduplicated data) and (iii) Data possession (where the client wants to verify whether the remote storage server is storing its data). Specifically, storage enforcement guarantee in the classical data possession problem removes any practical incentive for the storage server to cheat the client by saving on storage space. The proof of our result relies on a natural combination of Kolmogorov Complexity and List Decoding. To the best of our knowledge this is the first work that combines these two techniques. We believe the newly introduced storage enforcement property of almost universal hash functions will open promising avenues of exciting research under memory-bounded (bounded storage) adversary model.

Abstract:
This results in this paper have been merged with the result in arXiv:1002.3763v1 The authors would like to withdraw this version. Please see arXiv:1008.5356v1 for the merged version.