%0 Journal Article
%T 基于FastText模型的匿名数据文本分类研究
Research on Text Classification of Anonymous Data Based on FastText Model
%A 朱美瑶
%A 张寅昊
%A 王宇喆
%A 钟美君
%J Statistics and Applications
%P 563-568
%@ 2325-226X
%D 2023
%I Hans Publishing
%R 10.12677/SA.2023.122060
%X 本文主要讨论在数据匿名化情况下,FastText模型相比其它机器学习模型,对文本分类问题是否是更优解。本文对公开新闻数据集的20万条中文文本数据进行匿名化处理,然后分别采用逻辑回归、LGBM、随机森林和FastText模型进行分类,并且针对结果,对FastText提出两方面的改进,通过多个评价指标进行评价后,FastText模型无论在准确率上,还是在运行效率上,均比其它模型更优秀。
This paper focuses on whether the FastText model is a better solution to the text classification problem compared to other machine learning models in the case of data anonymization. In this paper, 200,000 Chinese text data from public news datasets are anonymized, and then logistic regression, LGBM, random forest and FastText models are used for classification, and two improvements to FastText are proposed for the results. The FastText model is better than other models in terms of both accuracy and efficiency.
%K 数据匿名化,FastText,TF-IDF,文本分类
Data Anonymization
%K FastText
%K TF-IDF
%K Text Classification
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=64962