全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Q-Learning算法的改进和实现
Improvement and Implementation of Q-Learning Algorithm

DOI: 10.12677/CSA.2021.117204, PP. 1994-2007

Keywords: 强化学习,Play Flappy Bird游戏,Q-Learning算法,深度卷积神经网络
Reinforcement Learning
, Play Flappy Bird Game, Q-Learning Algorithm, Deep Convolutional Neural Network

Full-Text   Cite this paper   Add to My Lib

Abstract:

机器学习领域开始越来越受人们关注并且也是人工智能最新的探寻方向。最近几年强化学习的研究增长部分原因是在玩一些电子游戏中可以达到人类所达不到的高水平。使用基于策略的强化学习算法可以更好地适应游戏环境,探索出一种相对稳定的路径,达到全局最优的目标。本文研究的是基于强化学习Q-learning算法的Play Flappy Bird游戏。首先研究了强化学习的理论知识,对马尔可夫决策、动态规划、值函数近似、时间差分等相关理论进行了深入研究。重点研究了建立Flappy Bird游戏中的状态、行为、奖励数学模型,为了得到最优策略,对每一个状态下的目标是使总奖励最大化。在此基础上,本文将对深度卷积神经网络模型展开训练,从而可以识别游戏状态中的图像,并对其进行分类。系统仿真成功地运用深度Q-learning模型实现Flappy Bird的自我学习,探索概率ε在550,000更新中从0.6线性下降到0,学习率一开始非常陡峭,但随后达到稳定,在比较短的时间内实现收敛效果,训练误差较低。智能体训练达到理想效果,均值得分为86分,最高得分为335分,已经超过普通人类玩家,取得了良好的成绩。
The field of machine learning has begun to attract more and more attention and is also the latest direction of artificial intelligence. Part of the reason for the growth of research in reinforcement learning in recent years is that playing some video games can reach high levels that humans cannot reach. Using strategy-based reinforcement learning algorithms can better adapt to the game envi-ronment, explore a relatively stable path, and achieve the goal of global optimization. This article studies the Play Flappy Bird game based on the Q-learning algorithm of reinforcement learning. First, the theoretical knowledge of reinforcement learning is studied, and related theories such as Markov decision-making, dynamic programming, value function approximation, time difference and other related theories are deeply studied. The focus is on the establishment of mathematical models of states, behaviors, and rewards in Flappy Bird games. In order to obtain the optimal strategy, the goal for each state is to maximize the total reward. On this basis, this article will train the deep convolutional neural network model so that images in the game state can be identified and classi-fied. The system simulation successfully implements the self-learning of Flappy Bird using the deep Q-learning model. The exploration probability ε decreases linearly from 0.6 to 0 in the 550,000 updates. The learning rate is very steep at the beginning, but then it reaches a stable level, the convergence effect is achieved in a relatively short time, and the training error is low. The intelligent body training achieves the ideal effect. The average score is 86 points, and the highest score is 335 points, which has surpassed ordinary human players and achieved good results.

References

[1]  刘炎锴. 基于深度学习的人脸身份认证方法研究[D]: [硕士学位论文]. 西安: 西安理工大学, 2017.
[2]  郭树旭, 马树志, 李晶, 张惠茅, 孙长建, 金兰依, 刘晓鸣, 刘奇楠, 李雪妍. 基于全卷积神经网络的肝脏CT影像分割研究[J]. 计算机工程与应用, 2017, 53(18): 126-131.
[3]  陈小平. 基于深度模型学习的跨模态检索[D]: [硕士学位论文]. 北京: 北京邮电大学, 2018.
[4]  晋帅, 李煊鹏, 何嘉颖, 李纾昶, 周敬淞. 基于强化学习的两轮模型车控制仿真分析[J]. 测控技术, 2019, 38(12): 115-121.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133