|
基于零和博弈的部分未知线性离散系统多智能体分布式最优跟踪控制
|
Abstract:
本文考虑了具有外部扰动的不确定线性离散系统分布式最优跟踪控制问题。现有的研究要求系统动力学已知且未证明最优解就是纳什均衡解。由于控制策略和干扰之间的竞争关系,该问题首先转变为多智能体零和博弈。本文根据所提出的新性能指标,采用内外循环算法对哈密顿雅可比艾萨克斯(HJI)方程进行迭代求解,并验证了收敛性。此外,它表明该算法得到的最优解是零和博弈的纳什均衡解。本文进一步表明,每当系统不完全已知时,单层神经网络可用于近似实值函数,与现有的三层网络相比,这可以降低计算复杂性。最后,通过仿真验证了该方法的有效性。
The paper studies the distributed optimal tracking control problem by considering linear discrete systems with unknown disturbances. The existing research requires that the system dynamics are known and have not proved that the optimal solution is the Nash equilibrium. Such a problem is first transformed into a multiagent zero-sum game due to the competitive situation among inputs and disturbances. According to the proposed new performance index, the internal and external loop algorithm is adopted to solve the Hamilton Jacobi Isaacs (HJI) equations iteratively, and the convergence is also proven. In addition, it shows that the optimal solution obtained by the algorithm is the Nash equilibrium of the zero-sum game. This paper further shows that, whenever the system is not fully known, the single-layer neural network could be used to approximate the real value function, which can reduce the computational complexity compared with the prevalent three-layer networks. Finally, simulations are provided to show the effectiveness of the method.
[1] | Mu, S.M., Chu, T.G. and Wang, L. (2005) Coordinated Collective Motion in a Motile Particle Group with a Leader. Physica A: Statistical Mechanics & Its Applications, 351, 211-226. https://doi.org/10.1016/j.physa.2004.12.054 |
[2] | Nash, J.F. (1950) Two-Person Cooperative Games. Econometrica, 21, 128-140. https://doi.org/10.2307/1906951 |
[3] | Nash, J.F. (1951) Non-Cooperative Games. Annals of Mathematics, 54, 286-295. https://doi.org/10.2307/1969529 |
[4] | Starr, A.W. and Ho. Y.C. (1969) Nonzero-Sum Differential Games. Journal of Optimization Theory and Applications, 3, 184-206. https://doi.org/10.1007/BF00929443 |
[5] | Vamvoudakis, K.G. and Lewis, F.L. (2011) Multi-Player Non-Zero-Sum Games: Online Adaptive Learning Solution of Coupled Hamilton-Jacobi Equations. Automatica, 47, 1556-1569. https://doi.org/10.1016/j.automatica.2011.03.005 |
[6] | Yang, D.S., Pang, Y.H. and Zhou, B.W. (2019) Fault Diagnosis for Energy Internet Using Correlation Processing-Based Convolutional Neural Networks. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49, 1739-1748.
https://doi.org/10.1109/TSMC.2019.2919940 |
[7] | Yang, X.F. and Gao, J.W. (2016) Linear-Quadratic Uncertain Differential Game with Application to Resource Extraction Problem. IEEE Transactions on Fuzzy Systems: A Publication of the IEEE Neural Networks Council, 24, 819-826.
https://doi.org/10.1109/TFUZZ.2015.2486809 |
[8] | Hong, Y.G., Hu, J.P. and Gao, L.X. (2008) Tracking Control for Multi-Agent Consensus with an Active Leader and Variable Topology. Automatica, 42, 1177-1182. https://doi.org/10.1016/j.automatica.2006.02.013 |
[9] | Ren, W., Moore, K.L. and Chen, Y.Q. (2006) High-Order and Model Reference Consensus Algorithms in Cooperative Control of Multivehicle Systems. Journal of Dynamic Systems Measurement and Control, 129, 678-688.
https://doi.org/10.1115/1.2764508 |
[10] | Freiling, G., Jank, G. and Abou-Kandil, H. (2002) On Global Existence of Solutions to Coupled Matrix Riccati Equations in Closed-Loop Nash Games. IEEE Transactions on Automatic Control, 41, 264-269.
https://doi.org/10.1109/9.481532 |
[11] | Abu-Khalaf, M., Lewis, F.L. and Huang, J. (2007) Policy Iterations on the Hamilton-Jacobi-Isaacs Equation for H∞ State Feedback Control with Input Saturation. IEEE Transactions on Automatic Control, 51, 1989-1995.
https://doi.org/10.1109/TAC.2006.884959 |
[12] | Lewis, F.L. and Vrabie, D. (2009) Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control. IEEE Circuits & Systems Magazine, 9, 32-50. https://doi.org/10.1109/MCAS.2009.933854 |
[13] | He, H.B., Ni, Z. and Fu. J. (2012) A Three-Network Architecture for On-Line Learning and Optimization Based on Adaptive Dynamic Programming. Neurocomputing, 78, 3-13. https://doi.org/10.1016/j.neucom.2011.05.031 |
[14] | Dierks, T. and Jagnnathan, S. (2012) Online Optimal Control of Affine Nonlinear Discrete-Time Systems with Unknown Internal Dynamics by Using Timebased Policy Update. IEEE Transactions on Neural Networks & Learning Systems, 23, 1118-1129. https://doi.org/10.1109/TNNLS.2012.2196708 |
[15] | Wei, L.Q., Wang, F.Y. and Liu, D.R. (2014) Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming. IEEE Transactions on Cybernetics, 44, 2820-2833. https://doi.org/10.1109/TCYB.2014.2354377 |
[16] | Ni, Z., He, H.B. and Zhao, D.B. (2015) GrDHP: A General Utility Function Representation for Dual Heuristic Dynamic Programming. IEEE Transactions on Neural Networks & Learning Systems, 26, 614-627.
https://doi.org/10.1109/TNNLS.2014.2329942 |
[17] | Wei, Q.L., Liu, D.R. and Lin, H.Q. (2016) Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems. IEEE Transactions on Cybernetics, 46, 840-853.
https://doi.org/10.1109/TCYB.2015.2492242 |
[18] | Gao, W.N. and Jiang, Z.P. (2016) Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems. IEEE Transactions on Automatic Control, 61, 4164-4169. https://doi.org/10.1109/TAC.2016.2548662 |
[19] | Zhang, H.G., Liang, H.J. and Wang, Z.S. (2017) Optimal Output Regulation for Heterogeneous Multiagent Systems via Adaptive Dynamic Programming. IEEE Transactions on Neural Networks & Learning Systems, 28, 18-29.
https://doi.org/10.1109/TNNLS.2015.2499757 |
[20] | Yang, Y.L., Wunsch, D. and Yin, Y.X. (2017) Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems. IEEE Transactions on Neural Networks & Learning Systems, 28, 1929-1940.
https://doi.org/10.1109/TNNLS.2017.2654324 |
[21] | Sun, J.L. and Long, T. (2020) Event-Triggered Distributed Zero-Sum Differential Game for Nonlinear Multi-Agent Systems Using Adaptive Dynamic Programming. ISA Transactions, 110, 39-52. |
[22] | 罗傲, 肖文彬, 周琪, 等. 基于强化学习的一类具有输入约束非线性系统最优控制[J/OL]. 控制理论与应用, 2021. |
[23] | Zhu, Y.H., Zhao, D.B. and Li, X.J. (2017) Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE Transactions on Neural Networks & Learning Systems, 28, 714-725.
https://doi.org/10.1109/TNNLS.2016.2561300 |
[24] | Yasini, S., Sistani, M.B. and Karimpour, A. (2014) Approximate Dynamic Programming for Two-Player Zero-Sum Game Related to H∞ Control of Unknown Nonlinear Continuous-Time Systems. International Journal of Control, Automation and Systems, 13, 99-109. https://doi.org/10.1007/s12555-014-0085-5 |
[25] | Song, R. and Zhu, L. (2019) Stable Value Iteration for Two-Player Zero-Sum Game of Discrete-Time Nonlinear Systems Based on Adaptive Dynamic Programming. Neurocomputing, 340, 180-195. |
[26] | Vamvoudakis, K.G., Safaei, F.R.P. and Hespanha, J.P. (2019) Robust Event-Triggered Output Feedback Learning Algorithm for Voltage Source Inverters with Unknown Load and Parameter Variations. International Journal of Robust and Nonlinear Control, 29, 3502-3517. https://doi.org/10.1002/rnc.4565 |
[27] | Yang, D.S., Li, T. and Zhang, H.G. (2019) Event-Trigger-Based Robust Control for Nonlinear Constrained-Input Systems Using Reinforcement Learning Method. Neurocomputing, 340, 158-170. |
[28] | 张正义, 赵学艳. 基于Q学习算法的随机离散时间系统的随机线性二次最优追踪控制[J]. 南京信息工程大学学报, 2020, 13(5): 548-555. |
[29] | Abouheaf, M.L., Lewis, F.L. and Vamvoudakis, K.G. (2014) Multi-Agent Discrete-Time Graphical Games and Reinforcement Learning Solutions. Automatica, 50, 3038-3053. |
[30] | Yang, N., Xiao, J.W. and Wang, Y.W. (2018) Non-Zero Sum Differential Graphical Game: Cluster Synchronisation for Multi-Agents with Partially Unknown Dynamics. International Journal of Control, 92, 2408-2419.
https://doi.org/10.1080/00207179.2018.1441550 |
[31] | Jiang, H., Zhang, H.G. and Han, J. (2018) Iterative Adaptive Dynamic Programming Methods with Neural Network Implementation for Multiplayer Zero-Sum Games. Neurocomputing, 307, 54-60. |
[32] | Liu, D.R., Li, H.L. and Wang, D. (2013) Neural-Network-Based Zero-Sum Game for Discrete-Time Nonlinear Systems via Iterative Adaptive Dynamic Programming Algorithm. Neurocomputing, 110, 92-100. |
[33] | 李传江, 马广富. 最优控制[M]. 北京: 科学出版社, 2011: 216-218. |
[34] | 吴受章. 最优控制理论与应用[M]. 北京: 机械工业出版社, 2007: 193-194. |
[35] | Luy, N.T. (2017) Distributed Cooperative H∞ Optimal Tracking Control of Mimo Nonlinear Multi-Agent Systems in Strict-Feedback Form via Adaptive Dynamic Programming. International Journal of Control, 91, 952-968.
https://doi.org/10.1080/00207179.2017.1300685 |
[36] | Jiao, Q., Modares, H. and Xu, S.Y. (2016) Multi-Agent Zero-Sum Differential Graphical Games for Disturbance Rejection in Distributed Control. Automatica, 69, 24-34. |
[37] | Vamvoudakis, K.G., Lewis, F.L. and Hudas, G.R. (2012) Multi-Agent Differential Graphical Games: Online Adaptive Learning Solution for Synchronization with Optimality. Automatica, 48, 1598-1611. |