Baxter J, Bartlett P L. InfiniteHorizon PolicyGradient Estimation. Journal of Artificial Intelligence Research, 2001, 15(4): 319350
[2]
Ghavamzadeh M. Hierarchical Reinforcement Learning in Continuous State and MultiAgent Environments. Ph.D Dissertation. Amherst, USA: University of Massachusetts. Graduate School, 2005
[3]
Ghavamzadeh M, Mahadevan S. Hierarchical Policy Gradient Algorithms // Proc of the 20th International Conference on Machine Learning. Washington, USA, 2003: 226233
[4]
Dietterich T G. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(1): 227303
[5]
Ghavamzadeh M, Mahadevan S, Makar R. Hierarchical Multiagent Reinforcement Learning. Journal of Autonomous Agents and MultiAgent Systems, 2006, 13(2): 197229
[6]
Hu Xiaohui, Shi Yuhui, Eberhart R. Recent Advances in Particle Swarm // Proc of the IEEE Congress on Evolutionary Computation. Portland, USA, 2004, Ⅰ: 9097
[7]
Peng Zhiping, Peng Hong, Zheng Qilun. Study on Bilateral and MultiIssue Autonomous Negotiation Model. Journal of Electronics & Information Technology, 2007, 29(3): 733738 (in Chinese) (彭志平,彭 宏,郑启伦.一种双边多议题自治协商模型的研究.电子与信息学报, 2007, 29(3): 733738)
[8]
Gao Yang, Chen Shifu, Lu Xin. Research on Reinforcement Learning Technology: A Review. Acta Automatica Sinica, 2004, 30(1): 86100 (in Chinese) (高 阳,陈世福,陆 鑫.强化学习研究综述.自动化学报, 2004, 30(1): 86100)
[9]
Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 4177
[10]
Li Wei, Ye Qingtai, Zhu Changming. Application of Hierarchical Reinforcement Learning in Engineering Domain. Journal of Systems Science and Systems Engineering, 2005, 14(2): 207217
[11]
Puterman M. Markov Decision Processes. New York, USA: Wiley, 1994
[12]
Su Chang, Gao Yang, Chen Shifu, et al. The Study of Recognizing Options Based on SMDP. Pattern Recognition and Artificial Intelligence, 2005, 18(6): 679684 (in Chinese) (苏 畅,高 阳,陈世福,等.基于SMDP环境的自主生成options算法的研究.模式识别与人工智能, 2005, 18(6): 679684)
[13]
Watkins C T, Dayan P. QLearning. Machine Learning, 1992, 8(3): 279292