|
基于强化学习的生成式对话系统研究
|
Abstract:
构建一个回复多样性的开放型对话系统模型,以尝试解决对话系统在回复过程中回答单调的问题。提出一种融合双向长短期记忆神经网络和强化学习模型的生成式对话方法。首先,采用多种类型过滤器对语料库进行预处理,使对话语料库能够被多样化探索到;其次,为了增加对话系统回复的多样性,采用多样性集束搜索作为解码器;最终,在微调模型阶段采用自评序列训练方法削减REINFORCE算法策略梯度的高方差现象。所提方法比Srinivasan等人的方法在BLUE、ROUGE-L、Perplexity分别增长了10.5%,9%和5%,模型的训练时间比原来缩短了43%。部分类型语料数量较少,所以对话系统在这方面的话题相对缺乏。传统的网络架构融合强化学习方法,能够有效地使对话系统产生极具价值意义的回复。
An open dialogue system model with diverse responses is constructed to try to solve the monotonous questions answered by the dialogue system during the response process. This paper proposes a generative dialogue method that combines bidirectional short-term memory neural network and reinforcement learning model. First, the corpus is preprocessed with various types of filters, so that the discourse corpus can be explored in a variety of ways; Secondly, in order to increase the diversity of the reply of the dialogue system, the diversity cluster search is used as the decoder; Finally, in the fine-tuning model stage, the self-assessment sequence training method is used to reduce the high square error phenomenon of the REINFORCE algorithm strategy gradient. Compared with Srinivasan’s method, the proposed method has increased 10.5%, 9% and 5% respectively in BLUE, ROUGE-L and Perplexity, and the training time of the model has been shortened by 43%. The num-ber of some types of corpus is relatively small, so the topic of dialogue system is relatively lacking. The traditional network architecture and reinforcement learning method can effectively make the dialogue system produce valuable replies.
[1] | Li, J., Monroe, W., Ritter, A., Jurafsky, D., Galley, M. and Gao, J. (2016) Deep Reinforcement Learning for Dialogue Generation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, 1-4 November 2016, 1192-1202. https://doi.org/10.18653/v1/D16-1127 |
[2] | Li, J., Galley, M., Brockett, C., Gao, J. and Dolan, B. (2016) A Diversity Promoting Objective Function for Neural Conversation Models. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, 12-17 June 2016, 110-119.
https://doi.org/10.18653/v1/N16-1014 |
[3] | Srinivasan, V., Santhanam, S. and Shaikh, S. (2019) Natural Language Generation Using Reinforcement Learning with External Rewards. ArXiv Preprint ArXiv: 1911.11404. |
[4] | Liu, Y., Zhang, L., Han, W., Zhang, Y. and Tu, K. (2021) Constrained Text Generation with Global Guidance—Case Study on CommonGen. ArXiv Preprint ArXiv: 2103.07170. |
[5] | Ive, J., Li, A.M., Miao, Y., et al. (2021) Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation. ArXiv Preprint ArXiv: 2102.11387. |
[6] | Srinivasan, V., Santhanam, S. and Shaikh, S. (2019) Natural Language Generation Using Reinforcement Learning with External Rewards. ArXiv Preprint ArXiv: 1911.11404. |
[7] | Liu, Q., Chen, Y., Chen, B., et al. (2020) You Impress Me: Dia-logue Generation via Mutual Persona Perception. Proceedings of the 58th Annual Meeting of the Association for Compu-tational Linguistics, Online, 5-10 July 2020, 1417-1427. https://doi.org/10.18653/v1/2020.acl-main.131 |
[8] | Vi-jayakumar, A.K., et al. (2017) Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models. ArXiv Preprint ArXiv: 1610.02424 |
[9] | Arulkumaran, K., Deisenroth, M.P., Brundage, M. and Bharath, A.A. (2017) A Brief Survey of Deep Reinforcement Learning. ArXiv Preprint ArXiv: 1708.05866. |
[10] | Danescu-Niculescu-Mizil, C. and Lee, L. (2011) Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs. ArXiv Preprint ArXiv: 1106.3077. |
[11] | Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. and Kavukcuoglu, K. (2016) Asynchronous Methods for Deep Reinforcement Learning. Proceed-ings of the 33rd International Conference on Machine Learning, New York, 19-24 June 2016, 1928-1937. |
[12] | Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J. and Goel, V. (2017) Self-Critical Sequence Training for Image Captioning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1179-1195. https://doi.org/10.1109/CVPR.2017.131 |
[13] | Xu, C., Li, P., Wang, W., et al. (2022) COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both Party Personas. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, 11-15 July 2022, 201-211. https://doi.org/10.1145/3477495.3531957 |
[14] | Cao, Y., Bi, W., Fang, M., Shi, S. and Tao, D. (2022) A Mod-el-Agnostic Data Manipulation Method for Persona-based Dialogue Generation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, 22-27 May 2022, 7984-8002. https://doi.org/10.18653/v1/2022.acl-long.550 |
[15] | Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J. (2002) Bleu: A Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, 7-12 July 2002, 311-318. |
[16] | 高俊. 开放域对话系统的多样化回复生成方法研究[D]: [硕士学位论文]. 苏州: 苏州大学, 2020.
https://doi.org/10.27351/d.cnki.gszhu.2020.001335 |
[17] | 王晶. 基于强化学习的情感对话回复生成算法研究[D]: [硕士学位论文]. 桂林: 桂林电子科技大学, 2020.
https://doi.org/10.27049/d.cnki.ggldc.2020.000309 |