|
基于弱点增强的LLM知识蒸馏算法
|
Abstract:
得益于大语言模型的技术发展,自然语言处理领域的知识蒸馏范式发生了颠覆式的改变。大模型提示知识获取方式,使得知识蒸馏的方式向着更通用的知识获取以及数据增强的方式发展。面向大模型时代知识提炼模式转变以及小模型小样本学习挑战,本文提出了一种基于弱点增强的LLM知识蒸馏算法,结合LLM的语义理解以及文本生成能力,实现小样本情况下的学生模型弱点分析,通过LLM教师模型针对学生模型弱点进行增强样本构建迭代训练强化,增强学生模型的能力。通过在多种自然语言处理实验结果表明,本文提出的方法在少样本标注需求下,通过知识蒸馏,可以大幅度提升模型的训练效果,充分证明了方法的有效性。
Thanks to the technological advancements in large language models, the knowledge distillation paradigm in the field of natural language processing has undergone a revolutionary change. The knowledge acquisition method prompted by LLM has led to a shift in knowledge distillation towards more universal knowledge acquisition and data augmentation approaches. In response to the transformation of knowledge extraction patterns in the era of LLM and the challenges of small-model, few-sample learning, this paper proposes a knowledge distillation algorithm for LLM based on weakness enhancement. By leveraging the semantic understanding and text generation capabilities of LLM, this algorithm enables the analysis of student model weaknesses under few-sample conditions. The LLM teacher model enhanced samples to train and strengthen the student model to enhance student model’s capabilities. Experimental results in various NLP tasks demonstrate that the proposed method, under the requirement of few labeled samples, can significantly improve the training effectiveness of models through knowledge distillation, fully proving the effectiveness of the method.
[1] | 黄震华, 杨顺志, 林威, 等. 知识蒸馏研究综述[J]. 计算机学报, 2022, 45(3): 624-653. |
[2] | 朱炫鹏, 姚海东, 刘隽, 等. 大语言模型算法演进综述[J]. 中兴通讯技术, 2024, 30(2): 9-20. |
[3] | 张钦彤, 王昱超, 王鹤羲, 等. 大语言模型微调技术的研究综述[J]. 计算机工程与应用, 2024, 60(17): 17-33. |
[4] | Liu, P.F., Yuan, W.Z., Fu, J.L., et al. (2021) Pre-Train Prompt and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. |
[5] | Chen, S., Chen, S., Xie, G., Shu, X., You, X. and Li, X. (2024) Rethinking Attribute Localization for Zero-Shot Learning. Science China Information Sciences, 67, 184-196. https://doi.org/10.1007/s11432-023-4051-9 |
[6] | 赵凯琳, 靳小龙, 王元卓. 小样本学习研究综述[J]. 软件学报, 2021, 32(2): 349-369. |
[7] | 周孝青, 段湘煜, 俞鸿飞, 等. 基于递进式半知识蒸馏的神经机器翻译[J]. 中文信息学报, 2021, 35(2): 52-60. |
[8] | 俞亮, 魏永丰, 罗国亮, 等. 基于知识蒸馏的隐式篇章关系识别[J]. 计算机科学, 2021, 48(11): 319-326. |
[9] | 黄友文, 魏国庆, 胡燕芳. DistillBIGRU: 基于知识蒸馏的文本分类模型[J]. 中文信息学报, 2022, 36(4): 81-89. |
[10] | 廖胜兰, 吉建民, 俞畅, 等. 基于BERT模型与知识蒸馏的意图分类方法[J]. 计算机工程, 2021, 47(5): 73-79. |
[11] | 顾佼佼, 翟一琛, 姬嗣愚, 等. 基于BERT和知识蒸馏的航空维修领域命名实体识别[J]. 电子测量技术, 2023, 46(3): 19-24. |
[12] | Shi, C., Su, J., Chu, C., Wang, B. and Feng, D. (2024) Balancing Privacy and Robustness in Prompt Learning for Large Language Models. Mathematics, 12, Article No. 3359. https://doi.org/10.3390/math12213359 |
[13] | Feng, K., Luo, L., Xia, Y., Luo, B., He, X., Li, K., et al. (2024) Optimizing Microservice Deployment in Edge Computing with Large Language Models: Integrating Retrieval Augmented Generation and Chain of Thought Techniques. Symmetry, 16, Article No. 1470. https://doi.org/10.3390/sym16111470 |
[14] | Do, D., Nguyen, M. and Nguyen, L. (2025) Enhancing Zero-Shot Multilingual Semantic Parsing: A Framework Leveraging Large Language Models for Data Augmentation and Advanced Prompting Techniques. Neurocomputing, 618, Article ID: 129108. https://doi.org/10.1016/j.neucom.2024.129108 |
[15] | Ullah, F., Gelbukh, A., Zamir, M.T., Riverόn, E.M.F. and Sidorov, G. (2024) Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu. Computers, 13, Article No. 258. https://doi.org/10.3390/computers13100258 |
[16] | Feng, S.J.H., Lai, E.M. and Li, W. (2024) Geometry of Textual Data Augmentation: Insights from Large Language Models. Electronics, 13, Article No. 3781. https://doi.org/10.3390/electronics13183781 |
[17] | 温浩, 杨洋. 融合ERNIE与知识增强的临床短文本分类研究[J/OL]. 计算机工程与应用, 2025: 1-10. http://kns.cnki.net/kcms/detail/11.2127.TP.20240527.1040.004.html, 2025-01-15. |
[18] | 杨笑笑, 陆奎. 融合ERNIE和深度学习的文本分类方法[J]. 湖北民族大学学报(自然科学版), 2023, 41(4): 506-512. |
[19] | Lu, Y.J., Liu, Q., Dai, D., et al. (2022) Unified Structure Generation for Universal Information Extraction. |
[20] | Ye, Z., Qi, D., Liu, H., Yan, Y., Chen, Q. and Liu, X. (2024) Rouie: A Method for Constructing Knowledge Graph of Power Equipment Based on Improved Universal Information Extraction. Energies, 17, Article No. 2249. https://doi.org/10.3390/en17102249 |
[21] | Zhang, J., Zhao, Y., Saleh, M., et al. (2020) Pegasus: Pre-Training with Extracted Gap-Sentences for Abstractive Summarization. International Conference on Machine Learning. PMLR, 13-18 July 2020, 11328-11339. |
[22] | Wang, J., Zhang, Y., Zhang, L., et al. (2022) Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence. |