Abstract:
We study online learnability of a wide class of problems, extending the results of (Rakhlin, Sridharan, Tewari, 2010) to general notions of performance measure well beyond external regret. Our framework simultaneously captures such well-known notions as internal and general Phi-regret, learning with non-additive global cost functions, Blackwell's approachability, calibration of forecasters, adaptive regret, and more. We show that learnability in all these situations is due to control of the same three quantities: a martingale convergence term, a term describing the ability to perform well if future is known, and a generalization of sequential Rademacher complexity, studied in (Rakhlin, Sridharan, Tewari, 2010). Since we directly study complexity of the problem instead of focusing on efficient algorithms, we are able to improve and extend many known results which have been previously derived via an algorithmic construction.

Abstract:
We resolve an open question from (Christiano, 2014b) posed in COLT'14 regarding the optimal dependency of the regret achievable for online local learning on the size of the label set. In this framework the algorithm is shown a pair of items at each step, chosen from a set of $n$ items. The learner then predicts a label for each item, from a label set of size $L$ and receives a real valued payoff. This is a natural framework which captures many interesting scenarios such as collaborative filtering, online gambling, and online max cut among others. (Christiano, 2014a) designed an efficient online learning algorithm for this problem achieving a regret of $O(\sqrt{nL^3T})$, where $T$ is the number of rounds. Information theoretically, one can achieve a regret of $O(\sqrt{n \log L T})$. One of the main open questions left in this framework concerns closing the above gap. In this work, we provide a complete answer to the question above via two main results. We show, via a tighter analysis, that the semi-definite programming based algorithm of (Christiano, 2014a), in fact achieves a regret of $O(\sqrt{nLT})$. Second, we show a matching computational lower bound. Namely, we show that a polynomial time algorithm for online local learning with lower regret would imply a polynomial time algorithm for the planted clique problem which is widely believed to be hard. We prove a similar hardness result under a related conjecture concerning planted dense subgraphs that we put forth. Unlike planted clique, the planted dense subgraph problem does not have any known quasi-polynomial time algorithms. Computational lower bounds for online learning are relatively rare, and we hope that the ideas developed in this work will lead to lower bounds for other online learning scenarios as well.

Abstract:
We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the best loss she would have achieved by picking, in hindsight, the best possible action. Our goal is to understand the magnitude of the best possible (minimax) regret. We study the problem under three different assumptions for the feedback the decision maker receives: full information, and the partial information models of the so-called "semi-bandit" and "bandit" problems. Combining the Mirror Descent algorithm and the INF (Implicitely Normalized Forecaster) strategy, we are able to prove optimal bounds for the semi-bandit case. We also recover the optimal bounds for the full information setting. In the bandit case we discuss existing results in light of a new lower bound, and suggest a conjecture on the optimal regret in that case. Finally we also prove that the standard exponentially weighted average forecaster is provably suboptimal in the setting of online combinatorial optimization.

Abstract:
Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm's ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm's actions. We define the alternative notion of policy regret, which attempts to provide a more meaningful way to measure an online algorithm's performance against adaptive adversaries. Focusing on the online bandit setting, we show that no bandit algorithm can guarantee a sublinear policy regret against an adaptive adversary with unbounded memory. On the other hand, if the adversary's memory is bounded, we present a general technique that converts any bandit algorithm with a sublinear regret bound into an algorithm with a sublinear policy regret bound. We extend this result to other variants of regret, such as switching regret, internal regret, and swap regret.

Abstract:
This paper considers the stability of online learning algorithms and its implications for learnability (bounded regret). We introduce a novel quantity called {\em forward regret} that intuitively measures how good an online learning algorithm is if it is allowed a one-step look-ahead into the future. We show that given stability, bounded forward regret is equivalent to bounded regret. We also show that the existence of an algorithm with bounded regret implies the existence of a stable algorithm with bounded regret and bounded forward regret. The equivalence results apply to general, possibly non-convex problems. To the best of our knowledge, our analysis provides the first general connection between stability and regret in the online setting that is not restricted to a particular class of algorithms. Our stability-regret connection provides a simple recipe for analyzing regret incurred by any online learning algorithm. Using our framework, we analyze several existing online learning algorithms as well as the "approximate" versions of algorithms like RDA that solve an optimization problem at each iteration. Our proofs are simpler than existing analysis for the respective algorithms, show a clear trade-off between stability and forward regret, and provide tighter regret bounds in some cases. Furthermore, using our recipe, we analyze "approximate" versions of several algorithms such as follow-the-regularized-leader (FTRL) that requires solving an optimization problem at each step.

Abstract:
In citep{Hazan-2008-extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in \citep{Hazan-2008-extract} when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials.

Abstract:
Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is R^n. Existing algorithms fail to achieve sub-linear regret in this setting unless constraints on the comparator point x^* are known in advance. We present algorithms that, without such prior knowledge, offer near-optimal regret bounds with respect to any choice of x^*. In particular, regret with respect to x^* = 0 is constant. We then prove lower bounds showing that our guarantees are near-optimal in this setting.

Abstract:
This paper describes a new online convex optimization method which incorporates a family of candidate dynamical models and establishes novel tracking regret bounds that scale with the comparator's deviation from the best dynamical model in this family. Previous online optimization methods are designed to have a total accumulated loss comparable to that of the best comparator sequence, and existing tracking or shifting regret bounds scale with the overall variation of the comparator sequence. In many practical scenarios, however, the environment is nonstationary and comparator sequences with small variation are quite weak, resulting in large losses. The proposed Dynamic Mirror Descent method, in contrast, can yield low regret relative to highly variable comparator sequences by both tracking the best dynamical model and forming predictions based on that model. This concept is demonstrated empirically in the context of sequential compressive observations of a dynamic scene and tracking a dynamic social network.

Abstract:
We explore the construction of nonsubjective prior distributions in Bayesian statistics via a posterior predictive relative entropy regret criterion. We carry out a minimax analysis based on a derived asymptotic predictive loss function and show that this approach to prior construction has a number of attractive features. The approach here differs from previous work that uses either prior or posterior relative entropy regret in that we consider predictive performance in relation to alternative nondegenerate prior distributions. The theory is illustrated with an analysis of some specific examples.

Abstract:
We present an algorithm which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition. Recently, this condition has been shown by (Bartok, Pal, and Szepesvari, 2011) to imply the O(\sqrt{T}) rate for partial monitoring games against an i.i.d. opponent, and the authors conjectured that the same holds for non-stochastic adversaries. Our result is in the affirmative, and it completes the characterization of possible rates for finite partial-monitoring games, an open question stated by (Cesa-Bianchi, Lugosi, and Stoltz, 2006). Our regret guarantees also hold for the more general model of partial monitoring with random signals.