(Chen X S, Yang Y M. Multi-step temporal difference learning algorithm based on recursive least-squares method[J]. Computer Engineering and Applications, 2010, 48(8): 52-55.)
[7]
Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems[J]. IEEE Trans on Systems, Man and Cybernetics, 1983, 13(5): 834-846.
[8]
Zhang H, Wei Q, Liu D. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games[J]. Automatica, 2011, 47(1): 207-214.
[9]
Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem[J]. Automatica, 2010, 46(5): 878-888.
[10]
Bhasin S, Sharma N, Patre P, et al. Asymptotic tracking by a reinforcement learning-based adaptive critic controller[J]. J of Control Theory and Application, 2011, 9(3): 400-409.
[11]
Sutton R S, Barto A G. Introduction to reinforcement learning[M]. Cambridge: MIT Press, 1998: 55-68.
[12]
Schaal S, Atkeson C. Learning control in robotics[J].IEEE Robotics and Automation Magazine, 2010, 17(2): 20-29.
[13]
Dung L T, Komeda T, Takagi M. Reinforcement learning for pomdp using state classification[J]. Applied Artificial Intelligence, 2008, 22(7): 761-779.
[14]
Lucian B, Robert B, Bart D S. A comprehension survey of multi-agent reinforcement learning[J]. IEEE Trans on Systems, Man and Cybernetics, Part C: Applications and Reviews, 2008, 68(2): 156-172.