%0 Journal Article %T 采用双层强化学习的干扰决策算法<br>An Algorithm for Jamming Decision Using Dual Reinforcement Learning %A 颛孙少帅 %A 杨俊安 %A 刘辉 %A 黄科举 %J 西安交通大学学报 %D 2018 %R 10.7652/xjtuxb201802010 %X 为解决强化学习算法在干扰决策过程中收敛速度慢的问题,提出了一种采用双层强化学习的干扰决策算法(DRLJD)。首先对等效通信参数进行建模,模型减少了待学习参数的个数,降低了搜索空间的维度;然后利用降维后的搜索空间指导干扰参数选择,避免随机选择导致干扰性能差的缺点;最后利用选择的干扰参数施加干扰,并根据环境反馈进一步降低搜索空间的维度,通过不断交互的方式加快算法的收敛速度。此外,将以往的干扰经验以先验信息的形式加入到系统的学习进程中,进一步缩短了系统的学习时间。针对构造的干扰问题实验表明,DRLJD算法经过200次交互便学习到优异的干扰策略,小于现有算法所需600次交互,且先验信息的利用进一步降低了对交互次数的要求。以提出的新的奖赏标准作为奖赏依据,算法能够在未知通信协议情况下以牺牲交互时间为代价学习到最佳干扰策略。<br>A novel algorithm for jamming decision using dual reinforcement learning (DRLJD) is proposed to accelerate convergence rate of reinforcement learning algorithms in jamming decision. First, a model of equivalent communication parameter is constructed to reduce both the number of unlearned parameters and the dimension of the search space. Secondly, the search space with reduced dimension is used to choose jamming parameters and to avoid worse jamming performance caused by random selection. Finally, the selected parameters are used to take jamming action, and to reduce the dimension of search space from the environment feedback information. The convergence rate of the algorithm is accelerated by constant interaction. Moreover, previous jamming experiences are used as prior information to further shorten the learning time of the system and to accelerate the convergence rate. The proposed DRLJD algorithm is validated by taking experiments on some jamming problems. Simulation results show that the algorithm obtains the optimal or suboptimal jamming strategy with 200 interaction times which is less than 600 interaction times of existing algorithms, and the use of prior information further reduces the requirements for the number of interactions. When the new reward standard is used as a basis for reward the proposed algorithm could learn the optimal jamming strategy at the expense of interaction times in the case that communication protocols are not known %K 强化学习 %K 双层强化学习 %K 干扰决策 %K 先验信息 %K 奖赏标准< %K br> %K reinforcement learning %K dual reinforcement learning %K jamming decision %K prior information %K reward standard %U http://zkxb.xjtu.edu.cn/oa/DArticle.aspx?type=view&id=201802010