|
基于大语言模型的文本生成综述
|
Abstract:
文本生成(Text Generation)是自然语言处理(NLP)领域的一项核心技术。由于自然语言自身的复杂性,在内容创作、人机对话、机器翻译等领域的实际应用需求驱动下,文本生成技术长期以来一直是NLP研究的重点、难点和热点。随着深度学习、预训练语言模型等技术的产生和发展,文本生成技术得到长足发展,而基于Transformer的大语言模型(LLM)的产生,则彻底使文本生成技术取得革命性突破。本文旨在对文本生成的技术、模型、范式等方面的历史和现状进行总结,特别侧重于大语言模型对文本生成在框架模型、技术方案、评估基准等方面所带来的变革,以及大语言模型在文本生成领域的典型应用场景,并对文本生成在大语言模型背景下的技术发展趋势进行展望。
Text generation is a fundamental technology in the field of Natural Language Processing (NLP). Due to the intrinsic complexity of natural language and the practical demands in applications such as content creation, human-computer interaction, and machine translation, text generation has long been a focal point of NLP research, characterized by its challenges and significant research interest. With the development of deep learning and pre-trained language models, text generation technology has made considerable advancements. The emergence of large language model (LLM) based on the Transformer architecture has brought about a paradigm shift, leading to groundbreaking progress in the field. This paper seeks to provide a comprehensive review of the evolution and current state of text generation techniques, models, and paradigms, with a particular emphasis on the transformative impact of LLM on the design frameworks, technical approaches, and evaluation benchmarks in text generation. Furthermore, this paper explores the representative application scenarios of LLM in text generation and discusses future research directions and technological trends in this domain within the context of LLM.
[1] | Wikipedia (2013) Quantum EntanglementBrown. https://en.wikipedia.org/wiki/Quantum_entanglementBrown |
[2] | Brown, R.D. and Frederking, R. (1995) Applying Statistical English Language Modelling to Symbolic Machine Translation. Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Leuven, 5-7 July 1995, 221-239. |
[3] | Tao, T., Wang, X., Mei, Q. and Zhai, C. (2006) Language Model Information Retrieval with Document Expansion. Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, 4-9 June 2006, 407-414. https://doi.org/10.3115/1220835.1220887 |
[4] | Zhai, C. and Lafferty, J. (2001) Model-Based Feedback in the Language Modeling Approach to Information Retrieval. Proceedings of the Tenth International Conference on Information and Knowledge Management, Atlanta, 5-10 October 2001, 403-410. https://doi.org/10.1145/502585.502654 |
[5] | Le Quoc, V. (2014) Sequence to Sequence Learning with Neural Networks. Advances in Neural Information Processing Systems, 27, 3104-3112. |
[6] | Li, J., Li, S., Zhao, W.X., He, G., Wei, Z., Yuan, N.J., et al. (2020) Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network. Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19-23 October 2020, 735-744. https://doi.org/10.1145/3340531.3411893 |
[7] | Li, J., Zhao, W.X., Wen, J. and Song, Y. (2019) Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 1969-1979. https://doi.org/10.18653/v1/p19-1190 |
[8] | Bahdanau, D. (2014) Neural Machine Translation by Jointly Learning to Align and Translate. arXiv Preprint, arXiv: 1409.0473. |
[9] | See, A., Liu, P.J. and Manning, C.D. (2017) Get to the Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, 30 July-4 August 2017, 1073-1083. https://doi.org/10.18653/v1/p17-1099 |
[10] | Iqbal, T. and Qureshi, S. (2022) The Survey: Text Generation Models in Deep Learning. Journal of King Saud University-Computer and Information Sciences, 34, 2515-2528. https://doi.org/10.1016/j.jksuci.2020.04.001 |
[11] | Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N. and Huang, X. (2020) Pre-Trained Models for Natural Language Processing: A Survey. Science China Technological Sciences, 63, 1872-1897. https://doi.org/10.1007/s11431-020-1647-3 |
[12] | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010. |
[13] | Devlin, J. (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv Preprint, arXiv:1810.04805. |
[14] | Radford, A., Wu, J., Child, R., et al. (2019) Language Models Are Unsupervised Multitask Learners. Preprint, OpenAI Blog. |
[15] | Brown, T.B. (2020) Language Models Are Few-Shot Learners. arXiv Preprint, arXiv:2005.14165. |
[16] | Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., et al. (2020) BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5-10 July 2020, 7871-7880. https://doi.org/10.18653/v1/2020.acl-main.703 |
[17] | Raffel, C., Shazeer, N., Roberts, A., et al. (2020) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21, 1-67. |
[18] | Kaplan, J., McCandlish, S., Henighan, T., et al. (2020) Scaling Laws for Neural Language Models. arXiv Preprint, arXiv:2001.08361. |
[19] | Singhal, K., Azizi, S., Tu, T., Mahdavi, S.S., Wei, J., Chung, H.W., et al. (2023) Large Language Models Encode Clinical Knowledge. Nature, 620, 172-180. https://doi.org/10.1038/s41586-023-06291-2 |
[20] | Chowdhery, A., Narang, S., Devlin, J., et al. (2023) Palm: Scaling Language Modeling with Pathways. Journal of Machine Learning Research, 24, 1-113. |
[21] | Wang, W., Bi, B., Yan, M., et al. (2019) Structbert: Incorporating Language Structures into Pre-Training for Deep Language Understanding. arXiv Preprint, arXiv:1908.04577. |
[22] | Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P., et al. (2023) A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from Gan to ChatGPT. arXiv Preprint, arXiv: 2303.04226. |
[23] | Sanner, S., Balog, K., Radlinski, F., Wedin, B. and Dixon, L. (2023) Large Language Models Are Competitive Near Cold-Start Recommenders for Language-and Item-Based Preferences. Proceedings of the 17th ACM Conference on Recommender Systems, Singapore, 18-22 September 2023, 890-896. https://doi.org/10.1145/3604915.3608845 |
[24] | Liang, X., Wang, H., Wang, Y., et al. (2024) Controllable Text Generation for Large Language Models: A Survey. arXiv Preprint, arXiv:2408.12599. |
[25] | Bubeck, S., Chandrasekaran, V., Eldan, R., et al. (2023) Sparks of Artificial General Intelligence: Early Experiments with GPT-4. arXiv Preprint, arXiv:2303.12712. |
[26] | Anil, R., Dai, A.M., First, O., et al. (2023) Palm 2 Technical Report. arXiv Preprint, arXiv:2305.10403. |
[27] | Wei, J., Wang, X., Schuurmans, D., et al. (2022) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35, 24824-24837. |
[28] | Radford, A., Narasimhan, K., Salimans, T. and Sutskever, I. (2018) Improving Language Understanding by Generative Pre-Training. Preprint. |
[29] | Abdaljalil, S. and Bouamor, H. (2021) An Exploration of Automatic Text Summarization of Financial Reports. Proceedings of the Third Workshop on Financial Technology and Natural Language Processing, Online, 19 August 2021, 1-7. |
[30] | Keskar, N.S., McCann, B., Varshney, L.R., et al. (2019) CTRL: A Conditional Transformer Language Model for Controllable Generation. arXiv Preprint, arXiv:1909.05858. |
[31] | Yao, S., Yu, D., Zhao, J., et al. (2024) Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, 10-16 December 2023, 11809-11822. |
[32] | Fedus, W., Zoph, B. and Shazeer, N. (2022) Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Journal of Machine Learning Research, 23, 1-39. |
[33] | Birhane, A., Kasirzadeh, A., Leslie, D. and Wachter, S. (2023) Science in the Age of Large Language Models. Nature Reviews Physics, 5, 277-280. https://doi.org/10.1038/s42254-023-00581-4 |
[34] | Taylor, R., Kardas, M., Cucurull, G., et al. (2022) Galactica: A Large Language Model for Science. arXiv Preprint, arXiv: 2211.09085. |
[35] | Touvron, H., Lavril, T., Izacard, G., et al. (2023) Llama: Open and Efficient Foundation Language Models. arXiv Preprint, arXiv: 2302.13971. |
[36] | Dong, Q., Li, L., Dai, D., et al. (2022) A Survey on In-Context Learning. arXiv Preprint, arXiv: 2301.00234. |
[37] | Biderman, S., Schoelkopf, H., Anthony, Q.G., et al. (2023) Pythia: A Suite for Analyzing Large Language Models across Training and Scaling. Proceedings of the 40th International Conference on Machine Learning, Honolulu, 23-29 July 2023, 2397-2430. |
[38] | Hoffmann, J., Borgeaud, S., Mensch, A., et al. (2022) Training Compute-Optimal Large Language Models. arXiv Preprint arXiv: 2203.15556. |
[39] | Rosenfeld, R. (2000) Two Decades of Statistical Language Modeling: Where Do We Go from Here? Proceedings of the IEEE, 88, 1270-1278. https://doi.org/10.1109/5.880083 |
[40] | Touvron, H., Martin, L., Stone, K., et al. (2023) LLAMA 2: Open Foundation and Fine-Tuned Chat Models. arXiv Preprint, arXiv: 2307.09288. |
[41] | Achiam, J., Adler, S., Agarwal, S., et al. (2023) GPT-4 Technical Report. arXiv Preprint, arXiv: 2303.08774. |
[42] | Wei, J., Tay, Y., Bommasani, R., et al. (2022) Emergent Abilities of Large Language Models. arXiv Preprint, arXiv: 2206.07682. |
[43] | Huberman, B.A. and Hogg, T. (1987) Phase Transitions in Artificial Intelligence Systems. Artificial Intelligence, 33, 155-171. https://doi.org/10.1016/0004-3702(87)90033-6 |
[44] | Rae, J.W., Borgeaud, S., Cai, T., et al. (2021) Scaling Language Models: Methods, Analysis & Insights from Training Gopher. arXiv Preprint, arXiv: 2112.11446. |
[45] | Sanh, V., Webson, A., Raffel, C., et al. (2022) Multitask Prompted Training Enables Zero-Shot Task Generalization. arXiv Preprint, arXiv: 2110.08207. |
[46] | Ouyang, L., Wu, J., Jiang, X., et al. (2022) Training Language Models to Follow Instructions with Human Feedback. Advances in Neural Information Processing Systems, 35, 27730-27744. |
[47] | Wei, J., Bosma, M., Zhao, V.Y., et al. (2021) Finetuned Language Models Are Zero-Shot Learners. arXiv Preprint, arXiv: 2109.01652. |
[48] | Thoppilan, R., De Freitas, D., Hall, J., et al. (2022) LAMDA: Language Models for Dialog Applications. arXiv Preprint, arXiv: 2201.08239. |
[49] | Chung, H.W., Hou, L., Longpre, S., et al. (2024) Scaling Instruction-Finetuned Language Models. Journal of Machine Learning Research, 25, 1-53. |
[50] | Luo, R., Sun, L., Xia, Y., Qin, T., Zhang, S., Poon, H., et al. (2022) BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining. Briefings in Bioinformatics, 23, bbac409. https://doi.org/10.1093/bib/bbac409 |
[51] | Madani, A., Krause, B., Greene, E.R., Subramanian, S., Mohr, B.P., Holton, J.M., et al. (2023) Large Language Models Generate Functional Protein Sequences across Diverse Families. Nature Biotechnology, 41, 1099-1106. https://doi.org/10.1038/s41587-022-01618-2 |
[52] | Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N.A., Khashabi, D., et al. (2023) Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, 9-14 July 2023, 13484-13508. https://doi.org/10.18653/v1/2023.acl-long.754 |
[53] | Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H.P., Kaplan, J., et al. (2021) Evaluating Large Language Models Trained on Code. arXiv E-Prints. |
[54] | Paperno, D., Kruszewski, G., Lazaridou, A., Pham, N.Q., Bernardi, R., Pezzelle, S., et al. (2016) The LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, 7-12 August 2016, 1525-1534. https://doi.org/10.18653/v1/p16-1144 |
[55] | Kocmi, T., Bawden, R., Bojar, O., et al. (2022) Findings of the 2022 Conference on Machine Translation (WMT22). Proceedings of the Seventh Conference on Machine Translation (WMT), Abu Dhabi, 7-8 December 2022, 1-45. |
[56] | Narayan, S., Cohen, S.B. and Lapata, M. (2018) Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, 31 October-4 November 2018, 1797-1807. |
[57] | Chen, M., Tworek, J., Jun, H., et al. (2021) Evaluating Large Language Models Trained on Code. arXiv Preprint, arXiv: 2107.03374. |
[58] | Lewis, P., Perez, E., Piktus, A., et al. (2020) Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459-9474. |
[59] | Xi, Z., Chen, W., Guo, X., et al. (2023) The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv Preprint, arXiv: 2309.07864. |
[60] | Wu, N., Gong, M., Shou, L., Liang, S. and Jiang, D. (2023) Large Language Models Are Diverse Role-Players for Summarization Evaluation. Natural Language Processing and Chinese Computing, Foshan, 12-15 October 2023, 695-707. https://doi.org/10.1007/978-3-031-44693-1_54 |
[61] | Zhang, T., Ladhak, F., Durmus, E., Liang, P., McKeown, K. and Hashimoto, T.B. (2024) Benchmarking Large Language Models for News Summarization. Transactions of the Association for Computational Linguistics, 12, 39-57. https://doi.org/10.1162/tacl_a_00632 |
[62] | Kocmi, T. and Federmann, C. (2023) Large Language Models Are State-of-the-Art Evaluators of Translation Quality. arXiv Preprint, arXiv: 2302.14520. |
[63] | Wang, L., Lyu, C., Ji, T., Zhang, Z., Yu, D., Shi, S., et al. (2023) Document-Level Machine Translation with Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023, 16646-16661. https://doi.org/10.18653/v1/2023.emnlp-main.1036 |
[64] | Kobusingye, B.M., Dorothy, A., Nakatumba-Nabende, J. and Marvin, G. (2023) Explainable Machine Translation for Intelligent E-Learning of Social Studies. 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, 11-13 April 2023, 1066-1072. https://doi.org/10.1109/icoei56765.2023.10125599 |
[65] | Dathathri, S., Madotto, A., Lan, J., et al. (2019) Plug and Play Language Models: A Simple Approach to Controlled Text Generation. arXiv Preprint, arXiv: 1912.02164. |
[66] | Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., et al. (2023) Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55, 1-38. https://doi.org/10.1145/3571730 |
[67] | Maynez, J., Narayan, S., Bohnet, B. and McDonald, R. (2020) On Faithfulness and Factuality in Abstractive Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5-10 July 2020, 1906-1919. https://doi.org/10.18653/v1/2020.acl-main.173 |
[68] | French, R. (1999) Catastrophic Forgetting in Connectionist Networks. Trends in Cognitive Sciences, 3, 128-135. https://doi.org/10.1016/s1364-6613(99)01294-2 |
[69] | Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S. (2021) On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual, 3-10 March 2021, 610-623. https://doi.org/10.1145/3442188.3445922 |
[70] | Raji, I.D., Smart, A., White, R.N., Mitchell, M., Gebru, T., Hutchinson, B., et al. (2020) Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, 27-30 January 2020, 33-44. https://doi.org/10.1145/3351095.3372873 |
[71] | Guo, Z., Jin, R., Liu, C., et al. (2023) Evaluating Large Language Models: A Comprehensive Survey. arXiv Preprint, arXiv: 2310.19736. |