Watkins C J C H, Dayan P. Technical Note: QLearning. Machine Learning, 1992, 8(3), 279292
[2]
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998
[3]
Wu Q H. Reinforcement Learning Control Using Interconnected Learning Automata. International Journal of Control, 1995, 62(1): 116
[4]
Zhang Rubo, Gu Guochang, Liu Zhaode, et al. Reinforcement Learning Theory, Algorithms and Its Application. Control Theory and Applications, 2000, 17(5): 637642 (in Chinese) (张汝波,顾国昌,刘照德,等.强化学习理论、算法及应用.控制理论与应用, 2000, 17(5): 637642)
[5]
Fan Bo, Pan Quan, Zhang Hongcai. A Method to Design the Reward Function Based on Knowledge in MultiAgent Learning. Computer Engineering and Applications, 2005, 41(3): 7779 (in Chinese) (范 波,潘 泉,张洪才.多智能体学习中基于知识的强化函数设计方法.计算机工程与应用, 2005, 41(3): 7779)
[6]
Zhang Rubo, Zhou Ning, Gu Guochang, et al. Reinforcement Learning Based Obstacle Avoidance Learning for Intelligent Robot. Robot, 1999, 21(3): 204209 (in Chinese) (张汝波,周 宁,顾国昌,等.基于强化学习智能机器人避碰方法研究.机器人, 1999, 21(3): 204209)
[7]
Yang Ming, Jia Li, Qiu Yuhui. Research on Automated Negotiation in MultiAgent System Based on Reinforcement Learning. Computer Engineering and Applications, 2004, 40(33): 98100,117 (in Chinese) (杨 明,嘉 莉,邱玉辉.基于增强学习的多Agent自动协商研究.计算机工程与应用, 2004, 40(33): 98100,117)
[8]
Ma Shoufeng, Li Ying, Liu Bao. AgentBased Learning Control Method for Urban Traffic Signal of Single Intersection. Journal of Systems Engineering, 2002, 17(6): 526530 (in Chinese) (马寿峰,李 英,刘 豹.一种基于Agent的单路口交通信号学习控制方法.系统工程学报, 2002, 17(6): 526530)
[9]
Jiang Guofei, Wu Cangpu. Learning to Control an Inverted Pendulum Using QLearning and Neural Networks. Acta Automatica Sinica, 1998, 24(5): 662666 (in Chinese) (蒋国飞,吴沧浦.基于Q学习算法和BP神经元网络的倒立摆控制.自动化学报, 1998, 24(5): 662666)