%0 Journal Article
%T 基于强化学习的生成式对话系统研究<br>Research on Generative Dialogue System Based on Reinforcement Learning
%A 颜永
%A 白宗文
%J Hans Journal of Data Mining
%P 185-193
%@ 2163-1468
%D 2023
%I Hans Publishing
%R 10.12677/HJDM.2023.132018
%X 构建一个回复多样性的开放型对话系统模型，以尝试解决对话系统在回复过程中回答单调的问题。提出一种融合双向长短期记忆神经网络和强化学习模型的生成式对话方法。首先，采用多种类型过滤器对语料库进行预处理，使对话语料库能够被多样化探索到；其次，为了增加对话系统回复的多样性，采用多样性集束搜索作为解码器；最终，在微调模型阶段采用自评序列训练方法削减REINFORCE算法策略梯度的高方差现象。所提方法比Srinivasan等人的方法在BLUE、ROUGE-L、Perplexity分别增长了10.5%，9%和5%，模型的训练时间比原来缩短了43%。部分类型语料数量较少，所以对话系统在这方面的话题相对缺乏。传统的网络架构融合强化学习方法，能够有效地使对话系统产生极具价值意义的回复。<br />
An open dialogue system model with diverse responses is constructed to try to solve the monotonous questions answered by the dialogue system during the response process. This paper proposes a generative dialogue method that combines bidirectional short-term memory neural network and reinforcement learning model. First, the corpus is preprocessed with various types of filters, so that the discourse corpus can be explored in a variety of ways; Secondly, in order to increase the diversity of the reply of the dialogue system, the diversity cluster search is used as the decoder; Finally, in the fine-tuning model stage, the self-assessment sequence training method is used to reduce the high square error phenomenon of the REINFORCE algorithm strategy gradient. Compared with Srinivasan’s method, the proposed method has increased 10.5%, 9% and 5% respectively in BLUE, ROUGE-L and Perplexity, and the training time of the model has been shortened by 43%. The num-ber of some types of corpus is relatively small, so the topic of dialogue system is relatively lacking. The traditional network architecture and reinforcement learning method can effectively make the dialogue system produce valuable replies.
%K 对话系统，强化学习，多样性探索，回复多样性，Dialogue System
%K Intensive Learning
%K Diversity Exploration
%K Reply to Diversity
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=64720