%0 Journal Article
%T 马尔科夫跳变线性系统二次最优控制的资格迹方法<br>Eligibility Trace Method for Quadratic Optimal Control of Markovian Jump Linear Quadratic Control
%A 朱亚楠
%J Pure Mathematics
%P 629-643
%@ 2160-7605
%D 2024
%I Hans Publishing
%R 10.12677/pm.2024.145216
%X 本文研究了资格迹方法在马尔科夫跳变线性系统的最优二次控制问题(MJLS-LQR)中的应用。常见的方法通过求解耦合的代数黎卡提方程得到最优控制，并不直接优化策略参数。本文在无模型强化学习方法的基础上引入资格迹，直接优化策略参数。考虑参数已知和参数未知两种情况下，MJLS-LQR问题的资格迹方法。参数未知时，无法利用系统参数信息精确表示资格迹，本文利用零阶优化定理近似资格迹，这可以将问题扩展至代价函数非凸的情况。在有限时域和高斯噪声的条件下，分别给出了两种情况下算法的全局收敛保证。数值模拟结果显示资格迹方法与梯度下降算法相比收敛更快。<br />
This paper studies the application of eligibility trace methods in the optimal quadratic control problem of Markov jump linear systems (MJLS-LQR). Common methods obtain optimal control by solving coupled algebraic Riccati equations, rather than directly optimizing policy parameters. Based on the model-free reinforcement learning method, this paper introduces eligibility traces to directly optimize policy parameters. The eligibility trace method for MJLS-LQR problems is considered under two scenarios: known parameters and unknown parameters. When the parameters are unknown, the system parameter information cannot be used to accurately represent the eligibility trace. This paper utilizes the zero-order optimization theorem to approximate the eligibility trace, which can extend the problem to non-convex cost functions. Global convergence guarantees for the algorithms under both scenarios are provided under the conditions of finite time horizon and Gaussian noise. Numerical simulation results show that the eligibility trace method converges faster compared with the gradient descent algorithm.
%K 最优控制，马尔科夫跳变系统，资格迹<br>Optimal Control
%K Markov Jump Linear Systems
%K Eligibility Traces
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=88744