%0 Journal Article
%T A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games
%A Hyeong Soo Chang
%J Game Theory
%D 2013
%R 10.1155/2013/290427
%X We provide a necessary condition that a constrained Nash-equilibrium (CNE) policy pair satisfies in two-person zero-sum constrained stochastic discounted-payoff games and discuss a general method of approximating CNE based on the condition. 1. Introduction Altman and Shwartz [1] established a sufficient condition for the existence of a stationary Markovian constrained Nash equilibrium (CNE) policy pair in a general model of finite two-person zero-sum constrained stochastic games and Alvarez-Mena and Hernández-Lerma [2] extended the result for infinite state and action spaces. Even though a few computational studies exist for average-payoff models with additional simplifying assumptions (see, e.g., [3–5]), there seems to be no work providing a meaningful necessary condition for CNE or any general approximation scheme for CNE within the general discounting-cost model. This brief paper establishes a necessary condition that a CNE policy pair satisfies by a novel characterization of the set of all feasible policies of one player when the other player's policy is fixed. This is done by identifying feasible mixed actions of one player at a current state when the expected total discounted constraint cost from each reachable next state is given by a value function defined over the state space. The necessary condition provides a general method of testing whether a given policy pair is a CNE policy pair and can induce a general approximation scheme for CNE. 2. Preliminaries Consider a two-person zero-sum Markov game (MG) [6] , where is a finite state set, and and are nonempty finite pure-action sets for the minimizer and the maximizer, respectively, at in with and . We denote the mixed action sets at in over and to be and , respectively, with and . Once in and in are simultaneously taken at in by the minimizer and the maximizer, respectively (with the complete knowledge of the state but without knowing each other's current action being taken), makes a transition to a next state by the probability given as Here denotes the probability of selecting , similar to , and denotes the probability of moving from to by and . Then the minimizer obtains an expected cost of given by where in is a payoff to the minimizer (the negative of this will be incurred to the maximizer). We define a stationary Markovian policy？？ of the minimizer as a function with for all and denote to be the set of all possible such policies. A policy is similarly defined for the maximizer with , and we denote to be the set of all possible such policies. Define the objective value of in and in with an
%U http://www.hindawi.com/journals/gt/2013/290427/