|
基于Agent服务的ChatGPT处理多方对话任务
|
Abstract:
随着大型语言模型(LLMs)规模的显著扩展,它们在众多自然语言处理(NLP)任务中展现出了卓越的零样本学习能力,能够在无需针对特定数据集进行预训练的情况下执行任务。这些模型在多个语言相关领域,包括搜索引擎,显示了显著的泛化能力。然而,LLMs在处理多方对话(MPC)——一个涉及多个参与者进行复杂信息交流的场景——的能力尚未得到充分探索。本文旨在评估生成型LLMs,如ChatGPT和GPT-4,在MPC领域的应用潜力。我们通过在两个包含四个代表性任务的MPC数据集上对ChatGPT和GPT-4进行实证分析,具体评估了它们的零样本学习能力。研究结果表明,ChatGPT在多个MPC任务上的表现仍有提升空间,而GPT-4的结果则显示出积极的发展前景。此外,我们尝试通过整合MPC结构和代理机制,涵盖说话者架构以及与四个任务相关的代理方法,以增强模型性能。本研究全面评估了生成型LLMs在多方对话中的应用,并深入分析了构建更高效、更强大的MPC代理的理念与策略,为该领域的进步提供了新的洞见。最后,我们指出了LLMs在MPC应用中面临的挑战,特别是在解析复杂信息流和生成风格一致的响应方面,这些挑战可能会影响模型的实际应用效果。
With the significant expansion of Large Language Models (LLMs), they have demonstrated remarkable zero-shot learning capabilities across various NLP tasks, performing well without specific dataset pre-training. These models have shown strong generalization in language-related fields like search engines. However, their ability in Multi-Party Conversation (MPC) scenarios, where multiple participants exchange complex information, remains underexplored. This paper evaluates the application potential of generative LLMs, such as ChatGPT and GPT-4, in the MPC domain. Through empirical analysis on two MPC datasets with four representative tasks, we assess their zero-shot learning abilities. Results show that ChatGPT has room for improvement in various MPC tasks, while GPT-4 shows promising prospects. Additionally, we enhance model performance by integrating MPC structures and agent mechanisms, covering speaker architectures and agent methods related to the four tasks. This study comprehensively evaluates generative LLMs in MPC and analyzes strategies for building more efficient and powerful MPC agents, offering new insights for the field. Finally, we highlight challenges in LLMs’ MPC applications, especially in parsing complex information flows and generating consistent responses, which may affect practical application.
[1] | Open AI (2022) Introducing ChatGPT. |
[2] | Open AI (2023) GPT-4 Technical Report. arXiv: 2303.08774. |
[3] | Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., et al. (2020) Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 6-12 December 2020, Virtual. |
[4] | Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., et al. (2023) Sparks of Artificial General Intelligence: Early Experiments with GPT-4. |
[5] | Qin, C.W., Zhang, A., Zhang, Z.S., et al. (2023) Is ChatGPT a General-Purpose Natural Language Processing Task Solver? |
[6] | Wang, J., Liang, Y., Meng, F., Sun, Z., Shi, H., Li, Z., et al. (2023). Is ChatGPT a Good NLG Evaluator? A Preliminary Study. Proceedings of the 4th New Frontiers in Summarization Workshop, Singapore, December 2023, 1-11. https://doi.org/10.18653/v1/2023.newsum-1.1 |
[7] | Goertzel, B. (2014) Artificial General Intelligence: Concept, State of the Art, and Future Prospects. Journal of Artificial General Intelligence, 5, 1-48. https://doi.org/10.2478/jagi-2014-0001 |
[8] | Ouyang, L., Wu, J., Mishkin, P., Zhang, C., et al. (2022) Training Language Models to Follow Instructions with Human Feedback. |
[9] | Cui, L., Wu, Y., Liu, S., Zhang, Y. and Zhou, M. (2020) Mutual: A Dataset for Multi-Turn Dialogue Reasoning. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 2020, 1406-1416. https://doi.org/10.18653/v1/2020.acl-main.130 |
[10] | Sun, W., Yan, L., Ma, X., Wang, S., Ren, P., Chen, Z., et al. (2023) Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, December 2023, 14918-14937. https://doi.org/10.18653/v1/2023.emnlp-main.923 |
[11] | Tan, C., Gu, J. and Ling, Z. (2023) Is ChatGPT a Good Multi-Party Conversation Solver? Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 2023, 4905-4915. https://doi.org/10.18653/v1/2023.findings-emnlp.326 |
[12] | Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., et al. (2022) PaLM: Scaling Language Modeling with Pathways. |
[13] | Touvron, H., Lavril, T., Izacard, G., Martinet, X., et al. (2023) Llama: Open and Efficient Foundation Language Models. |
[14] | Wei, J., Tay, Y., Bommasani, R., Raffel, C., et al. (2022) Emergent Abilities of Large Language Models. |
[15] | Wang, Y.Z., Kordi, Y., Mishra, S., Liu, A., et al. (2023) Self-Instruct: Aligning Language Model with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, (Volume 1: Long Papers), Toronto, 9-14 July 2023, 13484-13508. |
[16] | Peng, B.L., Li, C.Y., He, P.C., et al. (2023) Instruction Tuning with GPT-4. |
[17] | Chung, H.W., Hou, L., Longpre, S., Zoph, B., et al. (2022) Scaling Instruction-Finetuned Language Models. |
[18] | Taori, R., Gulrajani, I., Zhang, T.Y., Dubois, Y., et al. (2023) Stanford Alpaca: An Instruction-Following Llama Model. https://github.com/tatsu-lab/stanford_alpaca |
[19] | Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R. and Zhu, C. (2023) G-Eval: NLG Evaluation Using GPT-4 with Better Human Alignment. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, December 2020, 2511-2522. https://doi.org/10.18653/v1/2023.emnlp-main.153 |
[20] | Zheng, S., Huang, J. and Chang, K.C.-C. (2023) Why Does ChatGPT Fall Short in Answering Questions Faithfully? |
[21] | Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., et al. (2024) A Survey on Large Language Model Based Autonomous Agents. Frontiers of Computer Science, 18, Article 186345. https://doi.org/10.1007/s11704-024-40231-1 |
[22] | Zahiri, S.M. and Choi, J.D. (2018) Emotion Detection on TV Show Transcripts with Sequence-Based Convolutional Neural Networks. The Workshops of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, 2-7 February 2018, 44-52. |
[23] | Willcox, G. (1982) The Feeling Wheel. Transactional Analysis Journal, 12, 274-276. https://doi.org/10.1177/036215378201200411 |
[24] | Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E. and Mihalcea, R. (2019) MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, July 2019, 527-536. https://doi.org/10.18653/v1/p19-1050 |
[25] | Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, 2-7 June 2019, 4171-4186. |
[26] | Radford, A., Wu, J., Child, R., Luan, D., et al. (2019) Language Models Are Unsupervised Multitask Learners. Open AI Blog. |
[27] | Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., et al. (2020) BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Ju8ly 2020, 7871-7880. https://doi.org/10.18653/v1/2020.acl-main.703 |
[28] | Song, X., Huang, L., Xue, H. and Hu, S. (2022) Supervised Prototypical Contrastive Learning for Emotion Recognition in Conversation. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, December 2020, 5197-5206. https://doi.org/10.18653/v1/2022.emnlp-main.347 |
[29] | Loshchilov, I. and Hutter, F. (2017). Fixing Weight Decay Regularization in Adam. ArXiv: 1711.05101. |