|
基于奇异摄动强化学习的时变系统线性二次零和博弈研究
|
Abstract:
本研究探讨了时变系统中的线性二次零和博弈问题,与以往依赖系统模型的方法有所不同。本文提出了一种无模型的强化学习算法,用于寻找纳什均衡解。首先,通过奇异摄动理论,将时变动态博弈问题转化为两个定常系统的博弈问题。接着,利用无模型的强化学习算法,确定这两个定常系统的纳什均衡,进而近似求解了时变系统的纳什均衡解。本文提出的算法框架将为处理基于强化学习的时变系统鲁棒控制问题或信息物理系统的弹性控制问题提供新的研究思路。
This paper tackles the challenge of linear quadratic zero-sum games within dynamic systems that evolve over time. In contrast to previous methods that heavily rely on system models, this paper introduces a novel model-free reinforcement learning algorithm to determine Nash equilibrium solutions. To begin, the paper employs the singular perturbation theory to transform the time- varying dynamic game problem into two separate time-invariant dynamic game problems. Then, by leveraging a model-free reinforcement learning algorithm, it identifies Nash equilibria for these two time-invariant systems, effectively approximating the Nash equilibrium solution for the original time-varying system. The algorithm framework proposed in this paper introduces a fresh perspective for addressing robust control problems in dynamic systems with time variations. Additionally, it opens up new possibilities for robust control problems in time-varying systems or achieving resilient control in cyber-physical systems by harnessing the power of reinforcement learning.
[1] | a?ar, T. and Olsder, G.J. (1998) Dynamic Noncooperative Game Theory. Society for Industrial and Applied Mathematics, Philadelphia. https://doi.org/10.1137/1.9781611971132 |
[2] | Ho, Y., Bryson, A. and Baron, S. (1965) Differential Games and Optimal Pursuit-Evasion Strategies. IEEE Transactions on Automatic Control, 10, 385-389. https://doi.org/10.1109/TAC.1965.1098197 |
[3] | Ba?ar, T. and Bernhard, P. (2008) H∞-Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach. Birkh?user, Boston. https://doi.org/10.1007/978-0-8176-4757-5 |
[4] | Dow, J. and Werlang, S.R.D.C. (1994) Nash Equilibrium under Knightian Uncertainty: Breaking down Backward Induction. Journal of Economic Theory, 64, 305-324. https://doi.org/10.1006/jeth.1994.1071 |
[5] | Kleinman, D. (1968) On an Iterative Technique for Riccati Equation Computations. IEEE Transactions on Automatic Control, 13, 114-115. https://doi.org/10.1109/TAC.1968.1098829 |
[6] | Feng, Y., Anderson, B.D. and Rotkowitz, M. (2009) A Game Theoretic Algorithm to Compute Local Stabilizing Solutions to HJBI Equations in Nonlinear H∞ Control. Automatica, 45, 881-888.
https://doi.org/10.1016/j.automatica.2008.11.006 |
[7] | Vamvoudakis, K.G. and Lewis, F.L. (2012) Online Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration. International Journal of Robust and Nonlinear Control, 22, 1460-1483.
https://doi.org/10.1002/rnc.1760 |
[8] | Van Der Schaft, A.J. (1992) L/Sub 2/-Gain Analysis of Nonlinear Systems and Nonlinear State-Feedback H/Sub Infinity/Control. IEEE Transactions on Automatic Control, 37, 770-784. https://doi.org/10.1109/9.256331 |
[9] | Abu-Khalaf, M., Lewis, F.L. and Huang, J. (2006) Policy Iterations on the Hamilton-Jacobi-Isaacs Equation for H∞ State Feedback Control with Input Saturation. IEEE Transactions on Automatic Control, 51, 1989-1995.
https://doi.org/10.1109/TAC.2006.884959 |
[10] | Szmuk, M. and Acikmese, B. (2018) Successive Convexification for 6-DoF Mars Rocket Powered Landing with Free-Final-Time. 2018 AIAA Guidance, Navigation, and Control Conference, Kissimmee, 8-12 January 2018, 617-630.
https://doi.org/10.2514/6.2018-0617 |
[11] | Mahdavi, J., Emaadi, A., Bellar, M.D. and Ehsani, M. (1997) Analysis of Power Electronic Converters Using the Generalized State-Space Averaging Approach. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 44, 767-770. https://doi.org/10.1109/81.611275 |
[12] | Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. MIT Press, Cambridge. |
[13] | Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Hassabis, D., et al. (2016) Mastering the Game of Go with Deep Neural Networks and tree Search. Nature, 529, 484-489.
https://doi.org/10.1038/nature16961 |
[14] | Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hassabis, D., et al. (2017) Mastering the Game of Go without Human Knowledge. Nature, 550, 354-359. https://doi.org/10.1038/nature24270 |
[15] | Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W.M. and Ewalds, T. (2019) AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. DeepMind Blog.
https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/ |
[16] | Vrabie, D., Pastravanu, O., Abu-Khalaf, M. and Lewis, F.L. (2009) Adaptive Optimal Control for Continuous-Time Linear Systems Based on Policy Iteration. Automatica, 45, 477-484. https://doi.org/10.1016/j.automatica.2008.08.017 |
[17] | Zhang, H., Luo, Y. and Liu, D. (2009) Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems with Control Constraints. IEEE Transactions on Neural Networks, 20, 1490-1503.
https://doi.org/10.1109/TNN.2009.2027233 |
[18] | Jiang, Y. and Jiang, Z.P. (2012) Computational Adaptive Optimal Control for Continuous-Time Linear Systems with Completely Unknown Dynamics. Automatica, 48, 2699-2704. https://doi.org/10.1016/j.automatica.2012.06.096 |
[19] | Jiang, Y., Shi, D., Fan, J., Chai, T. and Chen, T. (2022) Event-Triggered Model Reference Adaptive Control for Linear Partially Time-Variant Continuous-Time Systems with Nonlinear Parametric Uncertainty. IEEE Transactions on Automatic Control, 68, 1878-1885. https://doi.org/10.1109/TAC.2022.3169847 |
[20] | Al-Tamimi, A., Lewis, F.L. and Abu-Khalaf, M. (2007) Model-Free Q-Learning Designs for Linear Discrete-Time Zero-Sum Games with Application to H-Infinity Control. Automatica, 43, 473-481.
https://doi.org/10.1016/j.automatica.2006.09.019 |
[21] | Li, H., Liu, D. and Wang, D. (2014) Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games with Completely Unknown Dynamics. IEEE Transactions on Automation Science and Engineering, 11, 706-714.
https://doi.org/10.1109/TASE.2014.2300532 |
[22] | Rizvi, S.A.A. and Lin, Z. (2018) Output Feedback Q-Learning for Discrete-Time Linear Zero-Sum Games with Application to the H-Infinity Control. Automatica, 95, 213-221. https://doi.org/10.1016/j.automatica.2018.05.027 |
[23] | Rizvi, S.A.A. and Lin, Z. (2020) Output Feedback Adaptive Dynamic Programming for Linear Differential Zero-Sum Games. Automatica, 122, Article ID: 109272. https://doi.org/10.1016/j.automatica.2020.109272 |
[24] | Pang, B., Bian, T. and Jiang, Z.P. (2019) Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Linear Time-Varying Discrete-Time Systems. Control Theory and Technology, 17, 73-84.
https://doi.org/10.1007/s11768-019-8168-8 |
[25] | Pang, B., Jiang, Z.P. and Mareels, I. (2020) Reinforcement Learning for Adaptive Optimal Control of Continuous-Time Linear Periodic Systems. Automatica, 118, Article ID: 109035.
https://doi.org/10.1016/j.automatica.2020.109035 |
[26] | Pang, B. and Jiang, Z.P. (2020) Adaptive Optimal Control of Linear Periodic Systems: An Off-Policy Value Iteration Approach. IEEE Transactions on Automatic Control, 66, 888-894. https://doi.org/10.1109/TAC.2020.2987313 |
[27] | Reddy, V., Eldardiry, H. and Boker, A. (2022) Singular Perturbation-Based Reinforcement Learning of Two-Point Boundary Optimal Control Systems. 2022 American Control Conference (ACC), Atlanta, 8-10 June 2022, 3323-3328.
https://doi.org/10.23919/ACC53348.2022.9867376 |
[28] | Wilde, R. and Kokotovic, P. (1972) A Dichotomy in Linear Control Theory. IEEE Transactions on Automatic control, 17, 382-383. https://doi.org/10.1109/TAC.1972.1099976 |
[29] | Jiang, Y. and Jiang, Z.P. (2012) Robust Adaptive Dynamic Programming. In: Lewis, F.L. and Liu, D., Eds., Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, Wiley-IEEE Press, New York, 281-302. https://doi.org/10.1002/9781118453988.ch13 |
[30] | Lewis, F.L., Vrabie, D. and Syrmos, V.L. (2012) Optimal Control. John Wiley & Sons, New York.
https://doi.org/10.1002/9781118122631 |
[31] | Kokotovi?, P., Khalil, H.K. and O’reilly, J. (1999) Singular Perturbation Methods in Control: Analysis and Design. Society for Industrial and Applied Mathematics, Philadelphia. https://doi.org/10.1137/1.9781611971118 |