Zhao Jin,Liu Weiyi,Jian Jinjian. State-Clusters Shared Cooperative Multiagent Reinforcement Learning // Proc of the 7th Asian Control Conference. Hong Kong,China,2009: 129-135
[2]
Van Seijen H,Whiteson S,van Hasselt H. Exploiting Best-Match Equations for Efficient Reinforcement Learning. Machine Learning Research,2011,12(6): 2045-2094
[3]
Fang Min,Li Hao,Zhang Xiaosong. A Heuristic Reinforcement Learning Based on State Backtracking Method // Proc of the IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology. Macau,China,2012: 673-678
[4]
Gao Yang,Chen Shifu,Lu Xin. A Survey of Reinforcement Learning. Acta Automatiea Sinica,2004,30(1): 86-100 (in Chinese)(高 阳,陈世福,陆 鑫.强化学习研究综述.自动化学报,2004,30(1): 86-100)
[5]
Busoniu L,Babuska R,de Schutter B. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans on Systems,Man,and Cybernetics,2008,38(2): 156-172
[6]
Stone P,Sutton R S,Kuhlmann G. Reinforcement Learning for RoboCup-Soccer Keepaway. Adaptive Behavior,2005,13(3): 165-188
[7]
Ota J. Multiagent Robot Systems as Distributed Autonomous Systems. Advanced Engineering Informatics,2006,20(1): 59-70
[8]
Bianchi R A C,Ribeiro C H C ,Costa A H R. Heuristically Accelerated Q-Learning: A New Approach to Speed up Reinforcement Learning // Proc of the 17th Brazilian Symposinm on Artificial Intelligence. Maranhao,Brazil,2004: 245-254
[9]
Barto A G,Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications,2003,13(1/2): 41-77
[10]
Marthi B. Automatic Shaping and Decomposition of Reward Functions // Proc of the 24th International Conference on Machine Learning. Corvallis,USA,2007: 601-608
[11]
Torrey L,Shavlik J,Walker T,et al. Skill Acquisition via Transfer Learning and Advice Taking // Proc of the 17th European Conference on Machine Learning. Berlin,Germany,2006: 425-436
[12]
Bianchi R A C,Ribeiro C H C,Costa A H R. Accelerating Autonomous Learning by Using Heuristic Selection of Actions. Journal of Heuristics,2008,14(2): 135-168
[13]
Liu Quan,Gao Yang,Chen Daoxu,et al. A Logical Reinforcement Learning Method Based on Heuristic Contour List. Journal of Computer Research and Development,2008,45(11): 1824-1830 (in Chinese)(刘 全,高 阳,陈道蓄,等.一种基于启发式轮廓表的逻辑强化学习方法.计算机研究与发展,2008,45(11): 1824-1830)
[14]
Liu Quan,Fu Qiming,Gong Shengrong,et al. Reinforcement Learning Algorithm Based on Minimum State Method and Average Reward. Journal on Communications,2011,32(1): 66-71 (in Chinese)(刘 全,傅启明,龚声蓉,等.最小状态变元平均奖赏的强化学习方法.通信学报,2011,32(1): 66-71)
[15]
Wei Yingzi,Zhao Mingyang. Design and Convergence Analysis of Heuristic Reward Function for Reinforcement Learning Algorithms. Computer Science,2005,32(3): 190-193 (in Chinese)(魏英姿,赵明扬.强化学习算法中启发式回报函数的设计及其收敛性分析.计算机科学,2005,32(3): 190-193)