全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于微分对策理论的非线性控制回顾与展望

DOI: 10.3724/SP.J.1004.2014.00001, PP. 1-15

Keywords: 微分对策,非线性系统,均衡,HJI方程,代价函数

Full-Text   Cite this paper   Add to My Lib

Abstract:

?微分对策是使用微分方程处理双方或多方连续动态冲突、竞争或合作问题的一种数学工具.它已经广泛应用于生物学、经济学、国际关系、计算机科学和军事战略等诸多领域.微分对策实质上是一种双方或多方的最优控制问题,它将现代控制理论与对策论相融合,从而比控制理论具有更强的竞争性、对抗性和适用性.本文根据非线性微分对策理论的控制、均衡及算法阐述了微分对策的理论发展历史,综述了已有结论与算法的本质,总结了现有的研究成果.最后对基于微分对策理论非线性系统的鲁棒性与最优性进行了展望.

References

[1]  Isaacs R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York: Dover Publications, 1999
[2]  Issacs R. Differential Games: SIAM Series in Applied Mathematics. New York: John Wiley and Sons, 1965
[3]  Friedman A. Differential Games: Pure and Applied Mathematics Series. New York: Wiley Interscience, 1971
[4]  Friedman A. Differential Games. Rhode Island: American Mathematical Society, 1974
[5]  Nash J. Non-cooperative games. Annals of Mathematics, 1951, 54(3): 286-295
[6]  Basar T, Olsder G J. Dynamic Noncooperative Game Theory (2nd Edition). New York: SIAM, Society for Industrial and Applied Mathematics, 1999
[7]  Basar T, Bernhard P. H∞-Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach (2nd Edition). Boston: Birkh?user Boston Inc., 2008
[8]  Song Chong-Hui, Bian Chun-Yuan, Zhang Xie, Shi Cheng-Long. Numerical optimization method for HJI equations derived from robust receding horizon control schemes and controller design. Scientia Sinica Informationis, 2011, 41(9): 1156-1170(宋崇辉, 边春元, 张勰, 史成龙. 鲁棒后退时域控制中HJI方程的数值解法及控制器设计. 中国科学: 信息科学, 2011, 41(9): 1156-1170)
[9]  Isaacs R. Differential games: their scope, nature, and future. Journal of Optimization Theory and Applications, 1969, 3(5): 283-292
[10]  Bardi M, Capuzzo-Dolcetta I. Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkh?user Boston Inc., 1997
[11]  Beard R, Saridis G, Wen J. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica, 1997, 33(12): 2159-2177
[12]  Sassano M, Astolfi A. Dynamic approximate solutions of the HJ inequality and of the HJB equation for input-affine nonlinear systems. IEEE Transactions on Automatic Control, 2012, 57(10): 2490-2503
[13]  Frihauf P, Krstic M, Basar T. Nash equilibrium seeking in noncooperative games. IEEE Transactions on Automatic Control, 2012, 57(5): 1192-1207
[14]  Shamma J S, Arslan G. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria. IEEE Transactions on Automatic Control, 2005, 50(3): 312-327
[15]  Engwerda J C. LQ Dynamic Optimization and Differential Games. New York: John Wiley and Sons Ltd, 2005
[16]  Aliyu M D S. Nonlinear H∞ Control, Hamiltonian Systems and Hamilton-Jacobi Equations. New York: CRC Press, 2011
[17]  Vamvoudakis K G, Lewis F L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. International Journal of Robust and Nonlinear Control, 2012, 22(13): 1460-1483
[18]  Limebeer D J N, Anderson B D O, Hendel B. A Nash game approach to mixed H2/H∞ control. IEEE Transactions on Automatic Control, 1994, 39(1): 69-82
[19]  Liu D R, Wei Q L. Finite-approximation-error based optimal control approach for discrete-time nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2013, 43(2): 779-789
[20]  Abu-Khalaf M, Lewis F L, Huang J. Neuro-dynamic programming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks, 2008, 19(7): 1243-1252
[21]  Starr A W, Ho Y C. Nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 3(3): 184-206
[22]  Pavel L, Fairman F W. Robust stabilization of nonlinear plants: an L2-approach. International Journal of Robust and Nonlinear Control, 1996, 6(7): 691-726
[23]  van der Schaft A J. L2-gain analysis of nonlinear systems and nonlinear state feedback H∞-control. IEEE Transactions on Automatic Control, 1992, 37(6): 770-784
[24]  Lu W M, Doyle J C. H∞ control of nonlinear systems: a convex characterization. IEEE Transactions on Automatic Control, 1995, 40(9): 1668-1675
[25]  Lin W, Byrnes C I. H∞ -control of discrete-time nonlinear systems. IEEE Transactions on Automatic Control, 1996, 41(4): 494-509
[26]  Lin W. Mixed H2/H∞-control for nonlinear systems. International Journal of Control, 1996, 64(5): 899-922
[27]  Chen B S, Chang Y C. Nonlinear mixed H2/H∞-control for robust tracking of robotic systems. International Journal of Control, 1998, 67(6): 837-857
[28]  Isidori A. Feedback control of nonlinear systems. International Journal of Robust and Nonlinear Control, 1992, 2(4): 291-311
[29]  Isidori A. H∞ control via measurement feedback for affine nonlinear systems. International Journal of Robust and Nonlinear Control, 1994, 4(4): 553-574
[30]  Isidori A, Kang W. H∞ control via measurement feedback for general class of nonlinear systems. IEEE Transactions on Automatic Control, 1995, 40(3): 466-472
[31]  Lin W, Byrnes C I. Dissipativity, L2-gain and H∞-control for discrete-time nonlinear systems. In: Proceedings of the 1994 American Control Conference. Baltimore, Maryland, 1994. 2257-2260
[32]  Lin W, Byrnes C I. Discrete-time nonlinear H∞ control with measurement feedback. Automatica, 1996, 31(3): 419-434
[33]  Guillard H, Monaco S, Normand-Cyrot D. Approximate solutions to nonlinear discrete-time H∞-control. IEEE Transactions on Automatic Control, 1995, 40(12): 2143-2148
[34]  Guillard H, Monaco S, Normand-Cyrot D. On H∞-control of discrete-time nonlinear systems. International Journal of Robust and Nonlinear Control, 1996, 6(7): 633-643
[35]  James M R, Baras J S. Robust H∞ output-feedback control for nonlinear systems. IEEE Transactions on Automatic Control, 1995, 40(6): 1007-1017
[36]  Engwerda J C. The regular convex cooperative linear quadratic control problem. Automatica, 2008, 44(9): 2453-2457
[37]  Engwerda J C, Salmah S. Necessary and sufficient conditions for Pareto optimal solutions of cooperative differential games. SIAM Journal on Control and Optimization, 2010, 48(6): 3859-3881
[38]  Reddy P V, Engwerda J C. Pareto optimality in infinite horizon linear quadratic differential games. Automatica, 2013, 49(6): 1705-1714
[39]  Starr A, Ho Y C. Further properties of nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 4(3): 207-219
[40]  Engwerda J C, Salmah S. Feedback nash equilibria for linear quadratic descriptor differential games. Automatica, 2012, 48(4): 625-631
[41]  von Stackelbe H. The Theory of the Market Economy. Oxford: Oxford University Press, 1952
[42]  Cruz J B. Leader-follower strategies for multilevel systems. IEEE Transactions on Automatic Control, 1978, 23(2): 244-255
[43]  Cruz J B. Survey of Nash and Stackelberg equilibrium strategies in dynamic games. Annals Economic and Social Measurement, 1975, 4(2): 339-344
[44]  Papavassilopoulos G P, Cruz J B. Nonclassical control problems and Stackelberg games. IEEE Transactions on Automatic Control, 1979, 24(2): 155-166
[45]  Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949
[46]  Barto A G, Sutton R S, Anderson C W. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Transactions on System, Man, and Cybernetic, Part B, 1983, 13(5): 834-846
[47]  Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791
[48]  Zhang Ping, Fang Yang-Wang, Hui Xiao-Bin, Liu Xin-Ai, Li Liang. Near optimal strategy for nonlinear stochastic differential games based on the technique of statistical linearization. Acta Automatica Sinica, 2013, 39(4): 390-399(张平, 方洋旺, 惠晓滨, 刘新爱, 李亮. 基于统计线性化的随机非线性微分对策逼近最优策略. 自动化学报, 2013, 39(4): 390-399)
[49]  Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 677-681(赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报, 2009, 35(6): 677-681)
[50]  Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998
[51]  Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 (张化光, 张欣, 罗艳红, 杨珺. 自适应动态规划综述. 自动化学报, 2013, 39(4): 303-311)
[52]  Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 2012, 32(6): 76-105
[53]  Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36
[54]  Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47
[55]  Vrabie D, Lewis F L. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 2009, 22(3): 237-246
[56]  Vamvoudakis K G, Lewis F L. Online synchronous policy iteration method for optimal control. Recent Advances in Intelligent Control Systems. Berlin: Springer-Verlag, 2009. 357-374
[57]  Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K G, Lewis F L, Dixon W E. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 2013, 49(1): 82-92
[58]  Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503
[59]  Zhang H G, Cui L L, Luo Y H. Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Transactions on Cybernetics, 2013, 43(1): 206-216
[60]  Wei Q L, Zhang H G. A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. In: Proceedings of the 2008 IEEE International Conference on Networking, Sensing and Control. Sanya, China: IEEE, 2008. 507-512
[61]  Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2010, 47(1): 207-214
[62]  Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888
[63]  Zhang X, Zhang H G, Luo Y H, Dong M. Iteration algorithm for solving the optimal strategies of a class of nonaffine nonlinear quadratic zero-sum games. In: Proceedings of the 2010 Chinese Control and Decision Conference (CDC). Xuzhou, China: IEEE, 2010. 1359-1364
[64]  Liu D R, Li H L, Wang D. Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing, 2013, 110(13): 92-100
[65]  Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556 -1569
[66]  Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes: research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5): 673-687(徐昕, 沈栋, 高岩青, 王凯. 基于马氏决策过程模型的动态系统学习控制: 研究前沿与展望. 自动化学报, 2012, 38(5): 673-687)
[67]  Sharma R, Gopal M. Synergizing reinforcement learning and game theory——a new direction for control. Applied Soft Computing, 2010, 10(3): 675-688
[68]  Littman M L. Value-function reinforcement learning in markov games. Journal of Cognitive Systems Research, 2001, 2(1): 55-56
[69]  Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning. New Brunswick, NJ: Morgan Kaufmann Publishers, 1994. 157-163
[70]  Frénay B, Saerens M. QL2, A simple reinforcement learning scheme for two-player zero-sum Markov games. Neurocomputing, 2009, 72(7-9): 1494-1507
[71]  Hu J L, Wellman M P. Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the 15th International Conference on Machine Learning. New Brunswick, NJ: Morgan Kaufmann Publishers, 1998. 242-250
[72]  Dockner E J, Steffen J, van Ngo L, Sorger G. Differential Games in Economics and Management Science. Cambridge: Cambridge University Press, 2001
[73]  Weber T A, Kryazhimskiy A V. Optimal Control Theory with Applications in Economics. Cambridge: The MIT Press, 2011
[74]  Wang Fei-Yue. Parallel control: a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4): 293-302(王飞跃. 平行控制: 数据驱动的计算控制方法. 自动化学报, 2013, 39(4): 293-302)
[75]  Anderson R, Moore T. The economics of information security. Science, 2006, 314(5799): 610-613
[76]  Nisan N, Roughgarden T, Tardos E, Vazirani V V. Algorithmic Game Theory. Cambridge: Cambridge University Press, 2007
[77]  Roughgarden T. Algorithmic game theory. Communications of the ACM, 2010, 53(7): 78-86
[78]  Wei Zhi-Qiang, Zhou Wei, Ren Xiang-Jun, Wei Qing, Jia Dong-Ning, Kang Mi-Jun, Yin Bo, Cong Yan-Ping. A strategy-proof trust based decision mechanism for pervasive computing environments. Chinese Journal of Computer, 2012, 35(5): 871-882(魏志强, 周炜, 任相军, 魏青, 贾东宁, 康密军, 殷波, 丛艳平. 普适计算环境中防护策略的信任决策机制研究. 计算机学报, 2004, 35(5): 871-882)
[79]  Semsar-Kazerooni E, Khorasani K. Multi-agent team cooperation: a game theory approach. Automatica, 2009, 45(10): 2205-2213
[80]  Fax J A, Murray R M. Information flow and cooperative control of vehicle formations. IEEE Transactions on Automatic Control, 2004, 49(9): 1465-1476
[81]  Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598-1611
[82]  Jadbabaie A, Lin J, Morse A S. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control, 2003, 48(6): 988 -1001
[83]  Chen X, Deng X T. Settling the complexity of two-player Nash equilibrium. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). Berkeley, USA: IEEE, 2006. 261-272
[84]  Daskalakis C. The Complexity of Nash Equilibria. Electrical Engineering and Computer Sciences [Ph.D. dissertation], University of California at Berkeley, USA, 2008
[85]  Nian Xiao-Hong, Huang Lin. New development on differential game theory and its application. Control and Decision, 2004, 19(2): 128-133(年晓红, 黄琳. 微分对策理论及其应用研究的新进展. 控制与决策, 2004, 19(2): 128-133)

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133