|
基于检索增强生成与软提示优化的大模型开放域问答方法
|
Abstract:
针对大语言模型(LLM)在开放域问答任务中长尾知识处理能力不足的问题,本文提出了一种融合检索增强生成(RAG)与软提示优化的新型框架SOFTRAG,旨在提升模型对低频知识的利用效率并缓解传统方法的局限性。研究结合检索增强生成(RAG)与软提示优化技术,并引入基于Perceiver的软提示适配器用于提取关键信息,同时采用LoRAMoE方法实现参数高效微调。在PopQA、TriviaQA、PubHealth和ASQA等数据集上,SOFTRAG框架在准确率、推理精度及泛化能力上均显著超越无检索基线和传统RAG方法。消融实验进一步验证了软提示、检索模块和微调技术对性能提升的关键作用。本研究方法有效平衡了性能与资源开销,显著改善了大模型在处理长尾知识任务中的表现,为开放域问答提供了新的优化思路。
To address the limitations of large language models (LLMs) in handling long-tail knowledge for open-domain question answering tasks, this paper proposes SOFTRAG, a novel framework that integrates Retrieval-Augmented Generation (RAG) with soft prompt optimization. The framework aims to enhance the utilization efficiency of low-frequency knowledge and mitigate the constraints of traditional approaches. The study combines RAG with soft prompt optimization techniques, introducing a Perceiver-based soft prompt adapter for extracting critical information and employing the LoRAMoE method for parameter-efficient fine-tuning. Evaluated on datasets including PopQA, TriviaQA, PubHealth, and ASQA, the SOFTRAG framework demonstrates significant improvements in accuracy, reasoning precision, and generalization capabilities compared to retrieval-free baselines and conventional RAG methods. Ablation experiments further validate the critical contributions of soft prompting, retrieval modules, and fine-tuning techniques to performance enhancement. This approach effectively balances performance with computational resource requirements, substantially improving LLMs’ performance on long-tail knowledge tasks and offering new optimization insights for open-domain question answering.
[1] | Floridi, L. and Chiriatti, M. (2020) GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines, 30, 681-694. https://doi.org/10.1007/s11023-020-09548-1 |
[2] | Devlin, J., Chang, M. W., Lee, K. and Toutanova, K. (2019) Bert: Pretraining of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, 2-7 June 2019, 4171-4186. |
[3] | Siriwardhana, S., Weerasekera, R., Wen, E., Kaluarachchi, T., Rana, R. and Nanayakkara, S. (2023) Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering. Transactions of the Association for Computational Linguistics, 11, 1-17. https://doi.org/10.1162/tacl_a_00530 |
[4] | Kandpal, N., Deng, H., Roberts, A., Wallace, E. and Raffel, C. (2023) Large Language Models Struggle to Learn Long-Tail Knowledge. International Conference on Machine Learning, Honolulu, 23-29 July 2023, 15696-15707. |
[5] | Zhang, T., Wang, C., Hu, N., Qiu, M., Tang, C., He, X., et al. (2022) DKPLM: Decomposable Knowledge-Enhanced Pre-Trained Language Model for Natural Language Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 11703-11711. https://doi.org/10.1609/aaai.v36i10.21425 |
[6] | Li, D., Yan, J., Zhang, T., Wang, C., He, X., Huang, L., et al. (2024) On the Role of Long-Tail Knowledge in Retrieval Augmented Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Bangkok, 11-16 August 2024, 120-126. https://doi.org/10.18653/v1/2024.acl-short.12 |
[7] | Islam, S.B., Rahman, M.A., Hossain, K.S.M.T., Hoque, E., Joty, S. and Parvez, M.R. (2024) Open-RAG: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, November 2024, 14231-14244. https://doi.org/10.18653/v1/2024.findings-emnlp.831 |
[8] | Asai, A., Wu, Z., Wang, Y., Sil, A. and Hajishirzi, H. (2023) Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv: 2310.11511. |
[9] | Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H. and Neubig, G. (2023) Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 55, 1-35. https://doi.org/10.1145/3560815 |
[10] | Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z. and Zhang, Y. (2024) A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. High-Confidence Computing, 4, Article ID: 100211. https://doi.org/10.1016/j.hcc.2024.100211 |
[11] | Qin, G. and Eisner, J. (2021) Learning How to Ask: Querying LMs with Mixtures of Soft Prompts. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 6-11 June 2021, 5203-5212. https://doi.org/10.18653/v1/2021.naacl-main.410 |
[12] | Dou, S., Zhou, E., Liu, Y., Gao, S., Shen, W., Xiong, L., et al. (2024) LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models via Moe-Style Plugin. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, 11-16 August 2024, 1932-1945. https://doi.org/10.18653/v1/2024.acl-long.106 |
[13] | Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., et al. (2023) Parameter-Efficient Fine-Tuning of Large-Scale Pre-Trained Language Models. Nature Machine Intelligence, 5, 220-235. https://doi.org/10.1038/s42256-023-00626-4 |
[14] | Ni, J., Hernandez Abrego, G., Constant, N., Ma, J., Hall, K., Cer, D., et al. (2022) Sentence-T5: Scalable Sentence Encoders from Pre-Trained Text-To-Text Models. Findings of the Association for Computational Linguistics: ACL 2022, Dublin, 22-27 May 2022, 1864-1874. https://doi.org/10.18653/v1/2022.findings-acl.146 |
[15] | Jaegle, A., Gimeno, F., Brock, A., Vinyals, O., Zisserman, A. and Carreira, J. (2021) Perceiver: General Perception with Iterative Attention. International Conference on Machine Learning, 18-24 July 2021, 4651-4664. |
[16] | Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Chen, W., et al. (2022) LoRA: Low-Rank Adaptation of Large Language Models. arXiv: 2106.09685. |
[17] | Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D. and Hajishirzi, H. (2023) When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, 9-14 July 2023, 9802-9822. https://doi.org/10.18653/v1/2023.acl-long.546 |
[18] | Joshi, M., Choi, E., Weld, D. and Zettlemoyer, L. (2017) TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, July 2017, 1601-1611. https://doi.org/10.18653/v1/p17-1147 |
[19] | Kotonya, N. and Toni, F. (2020) Explainable Automated Fact-Checking for Public Health Claims. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 16-20 November 2020, 7740-7754. https://doi.org/10.18653/v1/2020.emnlp-main.623 |
[20] | Stelmakh, I., Luan, Y., Dhingra, B. and Chang, M. (2022) ASQA: Factoid Questions Meet Long-Form Answers. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, 7-11 December 2022, 8273-8288. https://doi.org/10.18653/v1/2022.emnlp-main.566 |
[21] | Izacard, G., Caron, M., Hosseini, L., Riedel, S., Bojanowski, P., Joulin, A. and Grave, E. (2021) Unsupervised Dense in-Formation Retrieval with Contrastive Learning. arXiv: 2112.09118. |
[22] | Ni, J., Qu, C., Lu, J., Dai, Z., Hernandez Abrego, G., Ma, J., et al. (2022) Large Dual Encoders Are Generalizable Retrievers. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, 7-11 December 2022, 9844-9855. https://doi.org/10.18653/v1/2022.emnlp-main.669 |
[23] | Pillutla, K., Swayamdipta, S., Zellers, R., Thickstun, J., Welleck, S., Choi, Y. and Harchaoui, Z. (2021) Mauve: Measuring the Gap between Neural Text and Human Text Using Divergence Frontiers. Advances in Neural Information Processing Systems, 34, 4816-4828. |
[24] | Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Stoica, I., et al. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems, 36, 46595-46623. |
[25] | Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., Lample, G., et al. (2023) LLaMA: Open and Efficient Foundation Language Models. arXiv: 2302.13971. |
[26] | Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Scialom, T., et al. (2023) LLaMA 2: Open Foundation and Fine-Tuned Chat Models. arXiv: 2307.09288. |
[27] | Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Lowe, R., et al. (2022) Training Language Models to Follow Instructions with Human Feedback. Advances in Neural Information Processing Systems, 35, 27730-27744. |
[28] | Luo, H., Zhang, T., Chuang, Y., Gong, Y., Kim, Y., Wu, X., et al. (2023) Search Augmented Instruction Learning. Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 2023, 3717-3729. https://doi.org/10.18653/v1/2023.findings-emnlp.242 |
[29] | Dubois, Y., Li, C. X., Taori, R., Zhang, T., Gulrajani, I., Ba, J., Hashimoto, T.B., et al. (2023) AlpacaFarm: A Simulation Framework for Methods That Learn from Human Feedback. Advances in Neural Information Processing Systems, 36, 30039-30069. |