全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

马尔科夫跳变线性系统二次最优控制的资格迹方法
Eligibility Trace Method for Quadratic Optimal Control of Markovian Jump Linear Quadratic Control

DOI: 10.12677/pm.2024.145216, PP. 629-643

Keywords: 最优控制,马尔科夫跳变系统,资格迹
Optimal Control
, Markov Jump Linear Systems, Eligibility Traces

Full-Text   Cite this paper   Add to My Lib

Abstract:

本文研究了资格迹方法在马尔科夫跳变线性系统的最优二次控制问题(MJLS-LQR)中的应用。常见的方法通过求解耦合的代数黎卡提方程得到最优控制,并不直接优化策略参数。本文在无模型强化学习方法的基础上引入资格迹,直接优化策略参数。考虑参数已知和参数未知两种情况下,MJLS-LQR问题的资格迹方法。参数未知时,无法利用系统参数信息精确表示资格迹,本文利用零阶优化定理近似资格迹,这可以将问题扩展至代价函数非凸的情况。在有限时域和高斯噪声的条件下,分别给出了两种情况下算法的全局收敛保证。数值模拟结果显示资格迹方法与梯度下降算法相比收敛更快。
This paper studies the application of eligibility trace methods in the optimal quadratic control problem of Markov jump linear systems (MJLS-LQR). Common methods obtain optimal control by solving coupled algebraic Riccati equations, rather than directly optimizing policy parameters. Based on the model-free reinforcement learning method, this paper introduces eligibility traces to directly optimize policy parameters. The eligibility trace method for MJLS-LQR problems is considered under two scenarios: known parameters and unknown parameters. When the parameters are unknown, the system parameter information cannot be used to accurately represent the eligibility trace. This paper utilizes the zero-order optimization theorem to approximate the eligibility trace, which can extend the problem to non-convex cost functions. Global convergence guarantees for the algorithms under both scenarios are provided under the conditions of finite time horizon and Gaussian noise. Numerical simulation results show that the eligibility trace method converges faster compared with the gradient descent algorithm.

References

[1]  Zhang, Q., Li, L., Yan, X. and Spurgeon, S.K. (2017) Sliding Mode Control for Singular Stochastic Markovian Jump Systems with Uncertainties. Automatica, 79, 27-34.
https://doi.org/10.1016/j.automatica.2017.01.002
[2]  Costa, O.L., Fragoso, M.D. and Marques, R.P. (2004) Discrete-Time Markov Jump Linear Systems. IEEE Transactions on Automatic Control, 51, 916-917.
https://doi.org/10.1109/TAC.2006.874981
[3]  Tzortzis, I., Charalambous, C.D. and Hadjicostis, C.N. (2019) Robust LQG for Markov Jump Linear Systems. 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, 11-13 December 2019, 6760-6765.
https://doi.org/10.1109/CDC40024.2019.9028886
[4]  Todorov, M.G. and Fragoso, M.D. (2014) New Methods for Mode-Independent Robust Control of Markov Jump Linear Systems. 53rd IEEE Conference on Decision and Control, Los Angeles, 15-17 December 2014, 4222-4227.
https://doi.org/10.1109/CDC.2014.7040047
[5]  Wang, Y., Ahn, C.K., Yan, H. and Xie, S. (2020) Fuzzy Control and Filtering for Nonlinear Singularly Perturbed Markov Jump Systems. IEEE Transactions on Cybernetics, 51, 297-308.
https://doi.org/10.1109/TCYB.2020.3004226
[6]  Guo, Y. and Li, J. (2021) Network-Based Quantized H∞ Control for T-S Fuzzy Singularly Perturbed Systems with Persistent Dwell-Time Switching Mechanism and Packet Dropouts. Nonlinear Analysis: Hybrid Systems, 42, Article ID: 101060.
https://doi.org/10.1016/j.nahs.2021.101060
[7]  Tzortzis, I., Charalambous, C.D. and Hadjicostis, C.N. (2019) Robust LQG for Markov Jump Linear Systems. 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, 11-13 December 2019, 6760-6765.
https://doi.org/10.1109/CDC40024.2019.9028886
[8]  Lopes, R.O., Mendes, E.M., T?rres, L.A., Vargas, A.N. and Palhares, R.M. (2020) Finite-Horizon Suboptimal Control of Markov Jump Linear Parameter-Varying Systems. International Journal of Control, 94, 2659-2668.
https://doi.org/10.1080/00207179.2020.1728387
[9]  Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. MIT Press, Cambridge.
[10]  Souza, M., Fioravanti, A.R. and Araujo, V.S. (2021) Impulsive Markov Jump Linear Systems: Stability Analysis and H2 Control. Nonlinear Analysis: Hybrid Systems, 42, Article ID: 101089.
https://doi.org/10.1016/j.nahs.2021.101089
[11]  Chen, Y., Wen, J., Luan, X. and Liu, F. (2020) Robust Control for Markov Jump Linear Systems with Unknown Transition Probabilities—An Online Temporal Differences Approach. Transactions of the Institute of Measurement and Control, 42, 3043-3051.
https://doi.org/10.1177/0142331220940208
[12]  Park, I.S., Kwon, N.K. and Park, P. (2019) Dynamic Output-Feedback Control for Singular Markovian Jump Systems with Partly Unknown Transition Rates. Nonlinear Dynamics, 95, 3149-3160.
https://doi.org/10.1007/s11071-018-04746-0
[13]  Zhao, J. and Mili, L. (2019) A Decentralized H-Infinity Unscented Kalman Filter for Dynamic State Estimation Against Uncertainties. IEEE Transactions on Smart Grid, 10, 4870-4880.
https://doi.org/10.1109/TSG.2018.2870327
[14]  Kim, K.S. and Smagin, V.I. (2020) Robust Filtering for Discrete Systems with Unknown Inputs and Jump Parameters. Automatic Control and Computer Sciences, 54, 1-9.
https://doi.org/10.3103/S014641162001006X
[15]  Marcos, L.B. and Terra, M.H. (2020) Markovian Filtering for Driveshaft Torsion Estimation in Heavy Vehicles. Control Engineering Practice, 102, Article ID: 104552.
https://doi.org/10.1016/j.conengprac.2020.104552
[16]  Queiroz de Jesus, G. and Martins Calazans Silva, B. (2022) Robust Estimation for Discrete-Time Markovian Jump Linear Systems in a Data Fusion Scenario. Intermaths, 3, 17-36.
https://doi.org/10.22481/intermaths.v3i1.10715
[17]  Gray, W.S., González, O.R. and Do?an, M. (2000) Stability Analysis of Digital Linear Flight Controllers Subject to Electromagnetic Disturbances. IEEE Transactions on Aerospace and Electronic Systems, 36, 1204-1218.
https://doi.org/10.1109/7.892669
[18]  Bertsekas, D.P. (1995) Dynamic Programming and Optimal Control. 3rd Edition, Massachusetts Institute of Technology, Cambridge.
[19]  Bertsekas, D.P. (2011) Approximate Policy Iteration: A Survey and Some New Methods. Journal of Control Theory and Applications, 9, 310-335.
https://doi.org/10.1007/s11768-011-1005-3
[20]  Fazel, M., Ge, R. Kakade, S.M. and Mesbahi, M. (2018) Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator. International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1467-1476.
[21]  Hambly, B.M., Xu, R., and Yang, H. (2020) Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon. DecisionSciRN: Other Decision-Making in Economics (Topic).
[22]  Malik, D., Pananjady, A., Bhatia, K., Khamaru, K., Bartlett, P.L. and Wainwright, M.J. (2018) Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems. Journal of Machine Learning Research, 21, 1-51.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133