OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

控制理论与应用 2015

采用资格迹的神经网络学习控制算法
Learning to control by neural networks using eligibility traces

DOI: 10.7641/CTA.2015.40367

刘智斌,曾晓勤,徐彦,禹继国

Keywords: 强化学习神经网络资格迹倒立摆梯度下降
reinforcement learning neural networks eligibility traces cart-pole system gradient descent

Full-Text Cite this paper Add to My Lib

Abstract:

强化学习是解决自适应问题的重要方法, 被广泛地应用于连续状态下的学习控制, 然而存在效率不高和收敛速度较慢的问题. 在运用反向传播(back propagation, BP)神经网络基础上, 结合资格迹方法提出一种算法, 实现了强化学习过程的多步更新. 解决了输出层的局部梯度向隐层节点的反向传播问题, 从而实现了神经网络隐层权值的快速更新, 并提供一个算法描述. 提出了一种改进的残差法, 在神经网络的训练过程中将各层权值进行线性优化加权, 既获得了梯度下降法的学习速度又获得了残差梯度法的收敛性能, 将其应用于神经网络隐层的权值更新, 改善了值函数的收敛性能. 通过一个倒立摆平衡系统仿真实验, 对算法进行了验证和分析. 结果显示, 经过较短时间的学习, 本方法能成功地控制倒立摆, 显著提高了学习效率.
Reinforcement learning is an important approach to solve the adaptive learning control problems in continuous state space. However, it is bedeviled by its low learning efficiency and low convergence rate. In order to eliminate those deficiencies, based on back propagation (BP) neural networks and eligibility traces, we propose a learning algorithm with a complete description to achieve the multi-step updates in the process of reinforced learning to realize the counter propagation of the local gradient from output layer nodes to hidden layer nodes; thus, rapidly adjusting the weights of hidden layers. During the training processes of neural networks, a modified residual method is employed to optimize the weights in each layer by linear combination, achieving the rapid learning rate of the direct gradient method as well as the desired convergence properties of the residual gradient method. Applying this method to update the weights of hidden layers in a neural network, we improve the convergence properties of value functions. A cart-pole system is adopted for testing the application results of the above mentioned algorithms. Simulation results show that all our algorithms can successfully achieve the control for the cart-pole balancing system and improve the learning efficiency significantly.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

采用资格迹的神经网络学习控制算法Learning to control by neural networks using eligibility traces

采用资格迹的神经网络学习控制算法
Learning to control by neural networks using eligibility traces