全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

一种用于LQR控制问题的强化学习方法

, PP. 406-411

Keywords: 强化学习,递推最小二乘,TD学习,最优控制

Full-Text   Cite this paper   Add to My Lib

Abstract:

现有强化学习方法的收敛性分析大多针对离散状态问题,对于连续状态问题强化学习的收敛性分析仅局限于简单的LQR控制问题.本文对现有两种用于LQR问题收敛的强化学习方法进行分析,针对存在的问题,提出一种只需部分模型信息的强化学习方法.该方法使用递推最小二乘TD(RLSTD)方法估计值函数参数,递推最小二乘方法(RLS)估计贪心改进策略.并给出理想情况下此方法收敛的理论分析.仿真实验表明该方法收敛到最优控制策略.

References

[1]  Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998
[2]  Landelius T. Reinforcement Learning and Distributed Local Model Synthesis. PhD Dissertation. Department of Electrical Engineering, Linkoping University, Linkoping, Sweden, 1997
[3]  Xu X, He H G, Hu D W. Efficient Reinforcement Learning Using Recursive Least-Squares Methods. Journal of Artificial Intelligence Research, 2002, 16: 259-292
[4]  Wen F, Chen Z H, Wang A Q. An Improvement to Fast-AHC Algorithm. Information and Control, 2004, 32 (7): 652-656
[5]  Werbos P J. Stable Adaptive Control Using New Critic Designs. 1998. http://arxiv.org/html/adap-org/ 9810001
[6]  Tsitsikilis J N, Roy B V. An Analysis of Temporal -Difference Learning with Function Approximation. IEEE Trans on Automatic Control, 1997, 42(5): 674-690
[7]  Bradtke S J. Incremental Dynamic Programming for On-Line Adaptive Optimal Control. PhD Dissertation. Department of Computer Science, University of Massachusetts, Amherst, USA, 1994
[8]  Boyan J. Least-Squares Temporal Difference Learning. In: Bratko I, Dzeroski S, eds. Proc of the 16th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 1999, 49-56
[9]  Goodwin G C, Sin K S. Adaptive Filtering Prediction and Control. Englewood Cliffs, USA: Prentice-Hall, 1984

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133