%0 Journal Article
%T Reinforcement Learning Model Based on Regret for Multi-Agent Conflict Games
基于后悔值的多Agent冲突博弈强化学习模型
%A XIAO Zheng
%A ZHANG Shi-Yong
%A
肖正
%A 张世永
%J 软件学报
%D 2008
%I
%X For conflict game,a rational but conservative action selection method is investigated,namely, minimizing regret function in the worst case.By this method the loss incurred possibly in future is the lowest under this very policy,and Nash equilibrium mixed policy is obtained without information about other agents.Based on regret,a reinforcement learning model and its algorithm for conflict game under multi-agent complex environment are put forward.This model also builds agents' belief updating process on the concept of cross entropy distance, which further optimizes action selection policy for conflict games.Based on Markov repeated game model,this paper demonstrates the convergence property of this algorithm,and analyzes the relationship between belief and optimal policy.Additionally,compared with extended Q-learning algorithm under MMDP (multi-agent markov decision process),the proposed algorithm decreases the number of conflicts dramatically,enhances coordination among agents,improves system performance,and helps to maintain system stability.
%K Markov game
%K reinforcement learning
%K conflict game
%K conflict resolving
Markov对策
%K 强化学习
%K 冲突博弈
%K 冲突消解
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=300B49EEF3275D00FEE13B2A5E540111&yid=67289AFF6305E306&vid=2A8D03AD8076A2E3&iid=708DD6B15D2464E8&sid=9D3B6F38EE3E2C55&eid=FDA8066B1D4E2F67&journal_id=1000-9825&journal_name=软件学报&referenced_num=0&reference_num=18