|
Pure Mathematics 2025
基于多智能体强化学习的追逃微分博弈算法设计
|
Abstract:
本文针对传统追逃微分博弈模型在现实复杂环境下,特别是面对不完全信息和计算复杂度时求解困难的问题,创新性提出了一种基于柔性执行者–评论家(Soft Actor-Critic, SAC)算法的改进多智能体强化学习方法,应用于无人机追捕单一智能目标的微分博弈问题。SAC算法在追逃微分博弈中的优势体现在其自然实现了混合策略的概念,能够通过随机性来应对对手的动态变化,且具有较强的探索能力、稳定性和鲁棒性。与其他强化学习算法相比,SAC更适合处理不确定性强、对手行为复杂、动作空间连续的博弈问题。本文假设在部分可观测的环境下,追逐者和逃避者均无法知晓全部信息,仅能通过环境中的部分信息进行决策。为了解决这一连续优化问题,本文采用多智能体柔性执行者–评论家(multi-agent Soft Actor-Critic, MASAC)算法,使追逃双方智能体通过与环境的交互学习各自的最优策略。最终,本文通过测试展示了在部分可观测环境下,改进的多智能体强化学习方法在无人机追捕–逃避场景中的适用性与应用潜力。
This paper addresses the difficulty in solving traditional pursuit-evasion differential game models in complex real-world environments, especially when dealing with incomplete information and computational complexity. An innovative solution is proposed in the form of an improved multi-agent reinforcement learning method based on the Soft Actor-Critic (SAC) algorithm, applied to the differential game problem of unmanned aerial vehicles (UAVs) pursuing a single intelligent target. The advantage of the SAC algorithm in pursuit-evasion differential games lies in its natural implementation of the mixed strategy concept, allowing it to handle dynamic changes in the opponent’s behavior through randomness, while exhibiting strong exploration capabilities, stability, and robustness. Compared to other reinforcement learning algorithms, SAC is better suited for handling games with strong uncertainty, complex opponent behaviors, and continuous action spaces. In this paper, we assume a partially observable environment where both the pursuer and evader are unaware of the full information and can only make decisions based on partial environmental observations. To address this continuous optimization problem, we adopt the multi-agent Soft Actor-Critic (MASAC) algorithm, enabling both agents in the pursuit-evasion scenario to learn their optimal strategies through interactions with the environment. Ultimately, through testing, this paper demonstrates the applicability and potential of the improved multi-agent reinforcement learning method in UAV pursuit-evasion scenarios within partially observable environments.
[1] | Isaacs, R. (1965) Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. John Wiley and Sons. |
[2] | Weintraub, I.E., Pachter, M. and Garcia, E. (2020) An Introduction to Pursuit-Evasion Differential Games. 2020 American Control Conference (ACC), Denver, 1-3 July 2020, 1049-1066. https://doi.org/10.23919/acc45564.2020.9147205 |
[3] | Garcia, E., Fuchs, Z.E., Milutinovic, D., Casbeer, D.W. and Pachter, M. (2017) A Geometric Approach for the Cooperative Two-Pursuer One-Evader Differential Game. IFAC-PapersOnLine, 50, 15209-15214. https://doi.org/10.1016/j.ifacol.2017.08.2366 |
[4] | 杨傅云翔, 杨乐平, 朱彦伟, 等. 航天器轨道追逃态势分析的水平集方法[J]. 国防科技大学学报, 2024, 46(3): 30-38. |
[5] | Wang, Q., Wu, K., Ye, J., Wu, Y. and Xue, L. (2022) Apollonius Partitions Based Pursuit-Evasion Game Strategies by Q-Learning Approach. 2022 41st Chinese Control Conference (CCC), Hefei, 25-27 July 2022, 4843-4848. https://doi.org/10.23919/ccc55666.2022.9902778 |
[6] | 刘菁, 华翔, 张金金. 一种改进博弈学习的无人机集群协同围捕方法[J]. 西安工业大学学报, 2023, 43(3): 277-286. |
[7] | 谭浪, 巩庆海, 王会霞. 基于深度强化学习的追逃博弈算法[J]. 航天控制, 2018, 36(6): 3-8, 19. |
[8] | Wang, M., Wang, L. and Yue, T. (2019) An Application of Continuous Deep Reinforcement Learning Approach to Pursuit-Evasion Differential Game. 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, 15-17 March 2019, 1150-1156. https://doi.org/10.1109/itnec.2019.8729310 |
[9] | 郭万春, 解武杰, 尹晖, 等. 基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策[J]. 空军工程大学学报(自然科学版), 2021, 22(4): 15-21. |
[10] | Yuan, L., Zhang, Z., Li, L., et al. (2023) A Survey of Progress on Cooperative Multi-Agent Reinforcement Learning in Open Environment. arXiv: 2312.01058. |
[11] | Gronauer, S. and Diepold, K. (2021) Multi-Agent Deep Reinforcement Learning: A Survey. Artificial Intelligence Review, 55, 895-943. https://doi.org/10.1007/s10462-021-09996-w |
[12] | 许旭升, 党朝辉, 宋斌, 等. 基于多智能体强化学习的轨道追逃博弈方法[J]. 上海航天(中英文), 2022, 39(2): 24-31. |
[13] | Li, S., Wu, Y., Cui, X., Dong, H., Fang, F. and Russell, S. (2019) Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 4213-4220. https://doi.org/10.1609/aaai.v33i01.33014213 |
[14] | Haarnoja, T., Zhou, A., Hartikainen, K., et al. (2018) Soft Actor-Critic Algorithms and Applications. arXiv: 1812.05905. |