OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

自动化学报 2012

一类基于谱方法的强化学习混合迁移算法

DOI: 10.3724/SP.J.1004.2012.01765, PP. 1765-1776

朱美强, 程玉虎, 李明, 王雪松, 冯涣婷

Keywords: 强化学习,迁移学习,谱图理论,原型值函数,层次分解

Full-Text Cite this paper Add to My Lib

Abstract:

？在状态空间比例放大的迁移任务中,原型值函数方法只能有效迁移较小特征值对应的基函数,用于目标任务的值函数逼近时会使部分状态的值函数出现错误.针对该问题,利用拉普拉斯特征映射能保持状态空间局部拓扑结构不变的特点,对基于谱图理论的层次分解技术进行了改进,提出一种基函数与子任务最优策略相结合的混合迁移方法.首先,在源任务中利用谱方法求取基函数,再采用线性插值技术将其扩展为目标任务的基函数;然后,用插值得到的次级基函数(目标任务的近似Fiedler特征向量)实现任务分解,并借助改进的层次分解技术求取相关子任务的最优策略;最后,将扩展的基函数和获取的子任务策略一起用于目标任务学习中.所提的混合迁移方法可直接确定目标任务部分状态空间的最优策略,减少了值函数逼近所需的最少基函数数目,降低了策略迭代次数,适用于状态空间比例放大且具有层次结构的迁移任务.格子世界的仿真结果验证了新方法的有效性.

References

[1]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998
[2]	Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 676-681(赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报, 2009, 35(6): 676-681)
[3]	Chiu C C, Soo V W. Automatic complexity reduction in reinforcement learning. Computational Intelligence, 2010, 26(1): 1-25
[4]	Simsek O, Wolfe A P, Barto A G. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine Learning. New York, USA: ACM, 2005. 816-823
[5]	Luo Si-Wei, Zhao Lian-Wei. Manifold learning algorithms based on spectral graph theory. Journal of Computer Research and Development, 2006, 43(7): 1174-1179(罗四维, 赵连伟. 基于谱图理论的流形学习算法. 计算机研究与发展, 2006, 43(7): 1174-1179)
[6]	Shi J B, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905
[7]	Lagoudakis M G, Parr R. Least-squares policy iteration. Journal of Machine Learning Research, 2003, 4(12): 1107-1149
[8]	Xu X, Hu D W, Lu X C. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992
[9]	Chung F R K. Spectral Graph Theory. United States: American Mathematical Society, 1996
[10]	Gao Yang, Chen Shi-Fu, Lu Xin. Research on reinforcement learning technology: a review. Acta Automatica Sinica, 2004, 30(1): 86-100(高阳, 陈世富, 陆鑫. 强化学习研究综述. 自动化学报, 2004, 30(1): 86-100)
[11]	Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 2003, 13(4): 341-379
[12]	Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359
[13]	Taylor M E, Stone P. Transfer learning for reinforcement learning domains: a survey. The Journal of Machine Learning Research, 2009, 10: 1633-1685
[14]	Wang Hao, Gao Yang, Cheng Xing-Guo. Transfer of reinforcement learning: the state of the art. Acta Electronica Sinica, 2008, 36(12a): 39-43(王皓, 高阳, 陈兴国. 强化学习中的迁移: 方法和进展. 电子学报, 2008, 36(12a): 39-43)
[15]	Mahadevan S, Maggioni M. Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. The Journal of Machine Learning Research, 2007, 8: 2169-2231
[16]	Ferguson K, Mahadevan S. Proto-transfer Learning in Markov Decision Processes Using Spectral Methods, Technical Report, University Massachusetts, Amherst, 2008
[17]	Wang Xue-Song, Tian Xi-Lan, Cheng Yu-Hu, Yi Jian-Qiang. Q-learning system based on cooperative least squares support vector machine. Acta Automatica Sinica, 2009, 35(2): 214-219(王雪松, 田西兰, 程玉虎, 易建强. 基于协同最小二乘支持向量机的Q学习. 自动化学报, 2009, 35(2): 214-219)
[18]	Sutton R S, Precup D, Singh S. Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(1-2): 181-211

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133