%0 Journal Article
%T Contextual Bandits with Global Constraints and Objective
%A Shipra Agrawal
%A Nikhil R. Devanur
%A Lihong Li
%J Computer Science
%D 2015
%I arXiv
%X We consider the contextual version of a multi-armed bandit problem with global convex constraints and concave objective function. In each round, the outcome of pulling an arm is a context-dependent vector, and the global constraints require the average of these vectors to lie in a certain convex set. The objective is a concave function of this average vector. The learning agent competes with an arbitrary set of context-dependent policies. This problem is a common generalization of problems considered by Badanidiyuru et al. (2014) and Agrawal and Devanur (2014), with important applications. We give computationally efficient algorithms with near-optimal regret, generalizing the approach of Agarwal et al. (2014) for the non-constrained version of the problem. For the special case of budget constraints our regret bounds match those of Badanidiyuru et al. (2014), answering their main open question of obtaining a computationally efficient algorithm.
%U http://arxiv.org/abs/1506.03374v1