|
E-Commerce Letters 2024
医疗电商平台中大语言模型驱动的中文医学对话系统研究
|
Abstract:
随着互联网技术和人工智能的迅猛发展,医疗电商平台在现代医药服务中扮演着越来越重要的角色。本研究提出了一种基于大语言模型(LLM)的中文医学对话系统模型MedAsst,并探讨其在医疗电商平台中的应用。该模型以Qwen2-7B为基础,通过LoRA方法在147万条医学问答数据上进行监督微调。本文在医学多项选择题测试和自定义医学问答数据集上对MedAsst的有效性进行了全面评估。实验结果显示,MedAsst在BLEU-4、ROUGE-1、ROUGE-2和ROUGE-L等评价指标上均优于其他基线模型,特别是在医学问答能力上展现出显著优势。与LlaMa-3-8B、Gemma-7B、Mistral-7B和未经微调的Qwen2-7B模型相比,MedAsst通过合理的微调策略在特定领域的任务中表现出色,证明了监督微调的必要性和有效性。本文的研究不仅提升了模型在中文医学问答任务中的表现,也展示了大语言模型在医疗电商平台中的应用潜力,为未来在更复杂场景中的优化和实际应用提供了有力支持。
With the rapid development of Internet technology and artificial intelligence, medical e-commerce platforms play an increasingly important role in modern pharmaceutical services. This study proposes a Chinese medical dialogue system model MedAsst based on Large Language Model (LLM) and explores its application in medical e-commerce platform. The model is based on Qwen2-7B, and supervised fine-tuning is performed on 1.47 million medical question and answer data by LoRA method. In this paper, the effectiveness of MedAsst is thoroughly evaluated on a medical multiple-choice test and a customised medical quiz dataset. The experimental results show that MedAsst outperforms other baseline models on the evaluation metrics of BLEU-4, ROUGE-1, ROUGE-2, and ROUGE-L, and in particular demonstrates a significant advantage in medical quizzing ability. Compared with LlaMa-3-8B, Gemma-7B, Mistral-7B, and the unfine-tuned Qwen2-7B model, MedAsst performs well in domain-specific tasks through reasonable fine-tuning strategies, demonstrating the necessity and effectiveness of supervised fine-tuning. The research in this paper not only improves the performance of the model in the Chinese medical Q&A task, but also demonstrates the potential application of large language models in medical e-commerce platforms, which provides strong support for future optimisation and practical application in more complex scenarios.
[1] | 赵敏, 原超, 李朝霞. 大数据背景下医药电子商务服务模式的提升与探究[J]. 山西经济管理干部学院学报, 2018, 26(1): 45-48. |
[2] | 苏尤丽, 胡宣宇, 马世杰, 等. 人工智能在中医诊疗领域的研究综述[J]. 计算机工程与应用, 2024, 60(16): 1-18. |
[3] | Brown, T.B., Mann, B., Ryder, N., et al. (2020) Language Models Are Few-Shot Learners. |
[4] | Wang, H.C., Liu, C., Xi, N.W., Qiang, Z.W., Zhao, S.D., Qin, B. and Liu, T. (2023) HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge. https://arxiv.org/abs/2304.06975 |
[5] | Bai, J., Bai, S., Chu, Y., et al. (2023) Qwen Technical Report. https://arxiv.org/abs/2309.16609 |
[6] | Yang, A., Yang, B., Hui, B., et al. (2024) Qwen2 Technical Report. https://arxiv.org/abs/2407.10671 |
[7] | Hu, E.J., Shen, Y., Wallis, P., et al. (2021) LoRA: Low-Rank Adaptation of Large Language Models. https://arxiv.org/abs/2106.09685 |
[8] | 任芳慧, 郭熙铜, 彭昕, 等. 医疗领域对话系统口语理解综述[J]. 中文信息学报, 2024, 38(1): 24-35. |
[9] | Hayashi, Y. (1990) A Neural Expert System with Automated Extraction of Fuzzy If-Then Rules and Its Application to Medical Diagnosis. In: Proceedings of the 3rd International Conference on Neural Information Processing Systems, Morgan Kaufmann Publishers Inc., 578-584. |
[10] | Wong, W., Thangarajah, J. and Padgham, L. (2011). Health Conversational System Based on Contextual Matching of Community-Driven Question-Answer Pairs. Proceedings of the 20th ACM international conference on Information and knowledge management, 19-23 October 2020, 2577-2580. https://doi.org/10.1145/2063576.2064024 |
[11] | Li, Y.S., Lam, C.S.N. and See, C. (2021) Using a Machine Learning Architecture to Create an Ai-Powered Chatbot for Anatomy Education. Medical Science Educator, 31, 1729-1730. https://doi.org/10.1007/s40670-021-01405-9 |
[12] | 颜永, 白宗文. 基于强化学习的生成式对话系统研究[J]. 数据挖掘, 2023, 13(2): 185-193. |
[13] | Wang, S., Wang, S., Liu, Z. and Zhang, Q. (2023) A Role Distinguishing Bert Model for Medical Dialogue System in Sustainable Smart City. Sustainable Energy Technologies and Assessments, 55, Article ID: 102896. https://doi.org/10.1016/j.seta.2022.102896 |
[14] | 马德草, 杨桂松. 基于实体知识推理的端到端任务型对话[J]. 建模与仿真, 2024, 13(3): 3212-3221. |
[15] | Touvron, H., Lavril, T., Izacard, G., et al. (2023) LLaMA: Open and Efficient Foundation Language Models. https://arxiv.org/abs/2302.13971 |
[16] | Gemma Team, Mesnard, T., Hardin, C., et al. (2024) Gemma: Open Models Based on Gemini Research and Technology. https://arxiv.org/abs/2403.08295 |
[17] | Jiang, A.Q., Sablayrolles, A., Mensch, A., et al. (2023) Mistral 7B. https://arxiv.org/abs/2310.06825 |
[18] | Yang, S., Zhao, H., Zhu, S., et al. (2023) Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue. https://arxiv.org/abs/2308.03549 |
[19] | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., et al. (2017) Attention is All You Need. Annual Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, Long Beach, 5998-6008. |
[20] | Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebron, F. and Sanghai, S. (2023). GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023, 4895-4901. https://doi.org/10.18653/v1/2023.emnlp-main.298 |
[21] | Yang, T., et al. (2024) Generative Large Language Models (LLMs), Question-Answering (QA), Dialogue Model, Traditional Chinese Medical QA, Fine-Tuning. https://github.com/tyang816/MedChatZH |
[22] | Li, J.Q., et al. (2023) Huatuo-26M, a Large-Scale Chinese Medical QA Dataset. https://arxiv.org/abs/2305.01526 |
[23] | He, J., Fu, M. and Tu, M. (2019) Applying Deep Matching Networks to Chinese Medical Question Answering: A Study and a Dataset. BMC Medical Informatics and Decision Making, 19, Article No. 52. https://doi.org/10.1186/s12911-019-0761-8 |
[24] | Taori, R., et al. (2023) Stanford Alpaca: An Instruction-Following LLaMA Model. GitHub Repository. https://github.com/tatsu-lab/stanford_alpaca |
[25] | Zheng, Y., Zhang, R., Zhang, J., YeYanhan, Y. and Luo, Z. (2024) LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, August 2024, 400-410. https://doi.org/10.18653/v1/2024.acl-demos.38 |
[26] | Model Scope. https://www.modelscope.cn/my/overview |
[27] | Li, H.N., et al. (2023) CMMLU: Measuring Massive Multitask Language Understanding in Chinese. |
[28] | Huang, Y., et al. (2023) C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models. 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, 10-16 December 2023, 2-6. |