OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

控制与决策 2013

一类非线性动态系统基于强化学习的最优控制制

, PP. 1889-1893

陈学松,刘富春

Keywords: 非线性动态系统,强化学习,最优控制,值函数,策略函数

Full-Text Cite this paper Add to My Lib

Abstract:

提出一类非线性不确定动态系统基于强化学习的最优控制方法.该方法利用欧拉强化学习算法估计对象的未知非线性函数,给出了强化学习中回报函数和策略函数迭代的在线学习规则.通过采用向前欧拉差分迭代公式对学习过程中的时序误差进行离散化,实现了对值函数的估计和控制策略的改进.基于值函数的梯度值和时序误差指标值,给出了该算法的步骤和误差估计定理.小车爬山问题的仿真结果表明了所提出方法的有效性.

References

[1]	高阳, 陈世福, 陆鑫. 强化学习研究综述[J]. 自动化学报, 2004, 30(1): 86-100.
[2]	(Gao Y, Chen S F, Lu X. Research on reingorcement learning technology: A review[J]. Acta Automatica Sinica, 2004, 30(1): 86-100.)
[3]	陈学松, 杨宜民. 基于执行器-评价器学习的自适应PID 控制[J]. 控制理论与应用, 2011, 28(8): 1187-1193.
[4]	(Chen X S, Yang Y M. A novel adaptive PID controller based on actor-critic learning[J]. Control Theory & Applications, 2011, 28(8): 1187-1193.)
[5]	陈学松, 杨宜民. 基于递推最小二乘法的多步时序差分学习算法[J]. 计算机工程与应用, 2010, 48(8): 52-55.
[6]	(Chen X S, Yang Y M. Multi-step temporal difference learning algorithm based on recursive least-squares method[J]. Computer Engineering and Applications, 2010, 48(8): 52-55.)
[7]	Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems[J]. IEEE Trans on Systems, Man and Cybernetics, 1983, 13(5): 834-846.
[8]	Zhang H, Wei Q, Liu D. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games[J]. Automatica, 2011, 47(1): 207-214.
[9]	Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem[J]. Automatica, 2010, 46(5): 878-888.
[10]	Bhasin S, Sharma N, Patre P, et al. Asymptotic tracking by a reinforcement learning-based adaptive critic controller[J]. J of Control Theory and Application, 2011, 9(3): 400-409.
[11]	Sutton R S, Barto A G. Introduction to reinforcement learning[M]. Cambridge: MIT Press, 1998: 55-68.
[12]	Schaal S, Atkeson C. Learning control in robotics[J].IEEE Robotics and Automation Magazine, 2010, 17(2): 20-29.
[13]	Dung L T, Komeda T, Takagi M. Reinforcement learning for pomdp using state classification[J]. Applied Artificial Intelligence, 2008, 22(7): 761-779.
[14]	Lucian B, Robert B, Bart D S. A comprehension survey of multi-agent reinforcement learning[J]. IEEE Trans on Systems, Man and Cybernetics, Part C: Applications and Reviews, 2008, 68(2): 156-172.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133