%0 Journal Article %T 保守策略梯度与策略改进
Conservative Policy Gradient and Policy Improvement %A 黄儒泽 %J Pure Mathematics %P 218-226 %@ 2160-7605 %D 2025 %I Hans Publishing %R 10.12677/pm.2025.152062 %X 本文在双人非合作马尔科夫博弈模型下,引入了一种策略度量指标,将保守策略推广到了双智能体情形,给出了一种保守策略梯度和策略改进的条件。这为双人非合作博弈中寻找保守策略下的纳什均衡提供了一定基础和改进方向。
In this paper, a policy metric is introduced under the two-player non-cooperative Markov game model, which generalizes the conservative policy to the two-agent case, and gives a conservative policy gradient and the conditions for policy improvement. This provides a certain foundation and improvement direction for finding Nash equilibrium under policy in two-player non-cooperative game. %K 双人非合作马尔可夫博弈, %K 保守策略, %K 策略梯度, %K 策略改进
Two-Player Non-Cooperative Markov Game %K Conservative Policy %K Policy Gradient %K Policy Improvement %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=108329