OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

模式识别与人工智能 2013

基于状态回溯代价分析的启发式Q学习

, PP. 838-844

方敏,李浩

Keywords: 代价分析,启发函数,状态回溯,Q学习

Full-Text Cite this paper Add to My Lib

Abstract:

由于强化学习算法动作策略学习比较费时，提出一种基于状态回溯的启发式强化学习方法.分析强化学习过程中重复状态，通过比较状态回溯过程中重复动作的选择策略，引入代价函数描述重复动作的重要性.结合动作奖赏及动作代价提出一种新的启发函数定义.该启发函数在强调动作重要性以加快学习速度的同时，基于代价函数计算动作选择的代价以减少不必要的探索，从而平稳地提高学习效率.对基于代价函数的动作选择策略进行证明.建立两种仿真场景，将算法用于机器人路径规划的仿真实验.实验结果表明基于状态回溯的启发式强化学习方法能平衡考虑获得的奖赏及付出的代价，有效提高Q学习的收敛速度.

References

[1]	Zhao Jin,Liu Weiyi,Jian Jinjian. State-Clusters Shared Cooperative Multiagent Reinforcement Learning // Proc of the 7th Asian Control Conference. Hong Kong,China,2009: 129-135
[2]	Van Seijen H,Whiteson S,van Hasselt H. Exploiting Best-Match Equations for Efficient Reinforcement Learning. Machine Learning Research,2011,12(6): 2045-2094
[3]	Fang Min,Li Hao,Zhang Xiaosong. A Heuristic Reinforcement Learning Based on State Backtracking Method // Proc of the IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology. Macau,China,2012: 673-678
[4]	Gao Yang,Chen Shifu,Lu Xin. A Survey of Reinforcement Learning. Acta Automatiea Sinica,2004,30(1): 86-100 (in Chinese)(高阳,陈世福,陆鑫.强化学习研究综述.自动化学报,2004,30(1): 86-100)
[5]	Busoniu L,Babuska R,de Schutter B. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans on Systems,Man,and Cybernetics,2008,38(2): 156-172
[6]	Stone P,Sutton R S,Kuhlmann G. Reinforcement Learning for RoboCup-Soccer Keepaway. Adaptive Behavior,2005,13(3): 165-188
[7]	Ota J. Multiagent Robot Systems as Distributed Autonomous Systems. Advanced Engineering Informatics,2006,20(1): 59-70
[8]	Bianchi R A C,Ribeiro C H C ,Costa A H R. Heuristically Accelerated Q-Learning: A New Approach to Speed up Reinforcement Learning // Proc of the 17th Brazilian Symposinm on Artificial Intelligence. Maranhao,Brazil,2004: 245-254
[9]	Barto A G,Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications,2003,13(1/2): 41-77
[10]	Marthi B. Automatic Shaping and Decomposition of Reward Functions // Proc of the 24th International Conference on Machine Learning. Corvallis,USA,2007: 601-608
[11]	Torrey L,Shavlik J,Walker T,et al. Skill Acquisition via Transfer Learning and Advice Taking // Proc of the 17th European Conference on Machine Learning. Berlin,Germany,2006: 425-436
[12]	Bianchi R A C,Ribeiro C H C,Costa A H R. Accelerating Autonomous Learning by Using Heuristic Selection of Actions. Journal of Heuristics,2008,14(2): 135-168
[13]	Liu Quan,Gao Yang,Chen Daoxu,et al. A Logical Reinforcement Learning Method Based on Heuristic Contour List. Journal of Computer Research and Development,2008,45(11): 1824-1830 (in Chinese)(刘全,高阳,陈道蓄,等.一种基于启发式轮廓表的逻辑强化学习方法.计算机研究与发展,2008,45(11): 1824-1830)
[14]	Liu Quan,Fu Qiming,Gong Shengrong,et al. Reinforcement Learning Algorithm Based on Minimum State Method and Average Reward. Journal on Communications,2011,32(1): 66-71 (in Chinese)(刘全,傅启明,龚声蓉,等.最小状态变元平均奖赏的强化学习方法.通信学报,2011,32(1): 66-71)
[15]	Wei Yingzi,Zhao Mingyang. Design and Convergence Analysis of Heuristic Reward Function for Reinforcement Learning Algorithms. Computer Science,2005,32(3): 190-193 (in Chinese)(魏英姿,赵明扬.强化学习算法中启发式回报函数的设计及其收敛性分析.计算机科学,2005,32(3): 190-193)

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133