全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于探索密度的Option子目标发现算法

, PP. 236-240

Keywords: 递阶再励学习,Option,探索密度(ED)

Full-Text   Cite this paper   Add to My Lib

Abstract:

提出状态探索密度的概念,通过检测状态对智能体探索环境能力的影响来发现学习的子目标并构建对应的Option.用该算法创建Option的再励学习算法能有效提高学习速度.算法具有和任务无关、不需要先验知识等优点,构造出的Option在同一环境下不同任务间可以直接共享.

References

[1]  Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 4177
[2]  Sutton R, Precup D, Singh S. Between MDPs and SemiMDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1/2): 181211
[3]  Maron O, LozanoPérez T. A Framework for MultipleInstance Learning // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10: 570576
[4]  McGovern E A. Autonomous Discovery of Temporal Abstractions from Interaction with an Environment. Ph.D Dissertation. Amherts, USA: University of Massachusetts. Department of Computer Science, 2002
[5]  Hengst B. Discovering Hierarchy in Reinforcement Learning with HEXQ // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002: 243250
[6]  Wang Bennian, Gao Yang, Chen Zhaoqian, et al. KCluster Subgoal Discovery Algorithm for Option. Journal of Computer Research and Development, 2006, 43(5): 851855 (in Chinese) (王本年,高 阳,陈兆乾,等.面向Option的K聚类Subgoal发现算法.计算机研究与发展, 2006, 43(5): 851855)
[7]  Parr R, Russell S. Reinforcement Learning with Hierarchies of Machines // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10: 10431049
[8]  Dietterich T G. Hierarchical Reinforcement Learning with the Maxq Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(5): 227303

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133