P. J. Werbos. A Menu of Designs for Reinforcement Learning Over Time, in Neural Networks for Control[M], W. T. Miller, R. S. Sutton, and P. J. Werbos, Eds. Cambridge, MA: MIT Press, 1990: 67-95. [2] P. J. Werbos. Approximate Dynamic Programming for Real-time Control and Neural Modeling, in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches[M](Chapter 13), Edited by D. A. White and D. A. Sofge, New York, NY: Van Nostrand Reinhold, 1992: 493-525. [3] Danil V. Prokhorov, Donald C. Wunsch. Adaptive Critic Designs[J]. IEEE Transactions on neural networks, 1997, 8(5): 997-1007. [4] George G. Lendaris,Christian Paintz. Training Strategies for Critic and Action Neural Networks in Dual Heuristic Programming Method[C]. Proceedings of the 1997 IEEE International Conference on Neural Networks. Houston, TX, 1997: 712-717. [5] Jennie Si,Yu-Tsung Wang. On-Line Learning Control by Association and Reinforcement[J]. IEEE Transactions on Neural Networks, 2001, 12(2): 264-276. [6] Derong Liu,Xiaoxu Xiong,Yi Zhang. Action-Dependent Adaptive Critic Designs[C]. Proceedings of the 2001 IEEE International Conference on Neural Networks. Washington, D.C., 2001: 990–995. [7] Derong Liu, Ning Jin. ε-Adaptive Dynamic Programming for Discrete-Time Systems[C]. 2008 International Joint Conference on Neural Networks (IJCNN 2008). 2008: 1417-1424. [8] Murad Abu-Khalaf, Frank L. Lewis. Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach[J]. Automatica 41, 2005: 779- 791. [9] Huaguang Zhang,Qinglai Wei,Yanhong Luo. A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm[J]. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 2008, 38(4): 937-942. [10] Asma Al-Tamimi,Frank L. Lewis and Murad Abu-Khalaf. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof[J]. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 2008, 38(4): 943-949.