Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 4177
[2]
Sutton R, Precup D, Singh S. Between MDPs and SemiMDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1/2): 181211
[3]
Maron O, LozanoPérez T. A Framework for MultipleInstance Learning // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10: 570576
[4]
McGovern E A. Autonomous Discovery of Temporal Abstractions from Interaction with an Environment. Ph.D Dissertation. Amherts, USA: University of Massachusetts. Department of Computer Science, 2002
[5]
Hengst B. Discovering Hierarchy in Reinforcement Learning with HEXQ // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002: 243250
[6]
Wang Bennian, Gao Yang, Chen Zhaoqian, et al. KCluster Subgoal Discovery Algorithm for Option. Journal of Computer Research and Development, 2006, 43(5): 851855 (in Chinese) (王本年,高 阳,陈兆乾,等.面向Option的K聚类Subgoal发现算法.计算机研究与发展, 2006, 43(5): 851855)
[7]
Parr R, Russell S. Reinforcement Learning with Hierarchies of Machines // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10: 10431049
[8]
Dietterich T G. Hierarchical Reinforcement Learning with the Maxq Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(5): 227303