|
基于Doc2vec的微博评论情感倾向研究
|
Abstract:
该文针对疫苗接种的相关微博评论进行情感倾向分析,首先利用基于神经网络的Doc2vec模型训练文本向量,继而使用支持向量机(SVM)、随机森林(RF)、逻辑回归(LR)三种机器学习的算法完成情感分类任务,且分别讨论了三种算法在四种不同的Doc2vec模型设定方案下的分类表现。其中Distributed Memory version of Paragraph Vector (PV-DM)算法训练的文本向量中,RF表现最优,在方案一与方案二上其F1分数值均为最高,分别为87.24%、87.50%。基于Distributed Bag of Words version of Paragraph Vector (PV-DBOW)算法训练的文本向量中,SVM表现最优,在方案三与方案四上其F1分数值达到最高,分别为84.11%、83.91%。
Firstly, Doc2vec model based on neural network was used to train the text vector, and then three machine learning algorithms including Support Vector Machine (SVM), Random Forest (RF) and Logistic Regression (LR) were used to complete the emotion classification task. The classification performance of the three algorithms under four different Doc2vec model setting schemes is discussed respectively. Among the text vectors trained by the Distributed Memory version of Paragraph Vector (PV-DM) algorithm, RF performs best, and its F1 score is the highest in plan 1 and plan 2, which are 87.24% and 87.50%, respectively. Among the text vectors trained by the Distributed Bag of Words Version of Paragraph Vector (PV-DBOW) algorithm, SVM has the best performance, and its F1 score is the highest in scheme 3 and scheme 4, which are 84.11% and 83.91% respectively.
[1] | Hu, M. and Liu, B. (2004) Mining and Summarizing Customer Reviews. Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 168-177.
https://doi.org/10.1145/1014052.1014073 |
[2] | Taboada, M., Brooke, J., Tofiloski, M., et al. (2011) Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics, 37, 267-307. https://doi.org/10.1162/COLI_a_00049 |
[3] | 赵妍妍, 秦兵, 石秋慧, 等. 大规模情感词典的构建及其在情感分类中的应用[J]. 中文信息学报, 2017, 31(2). 187-193. |
[4] | 吴杰胜. 基于多部情感词典和深度学习的中文微博情感分析研究[D]: [硕士学位论文]. 淮南: 安徽理工大学, 2020. |
[5] | Pang, B., Lee, L., Vaithyanathan, S., et al. (2002) Thumbs Up?: Sentiment Classification Using Machine Learning Techniques. Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), 10, 79-86.
https://doi.org/10.3115/1118693.1118704 |
[6] | 刘志明, 刘鲁. 基于机器学习的中文微博情感分类实证研究[J]. 计算机工程与应用, 2012, 48(01): 1-4. |
[7] | 孙建旺, 吕学强, 张雷瀚. 基于词典与机器学习的中文微博情感分析研究[J]. 计算机应用与软件, 2014, 31(07): 177-181. |
[8] | 李明, 胡吉霞, 侯琳娜, 等. 商品评论情感倾向性分析[J]. 计算机应用, 2019, 39(S02): 15-19. |
[9] | 王颖洁, 朱久祺, 汪祖民, 等. 自然语言处理在情感分析领域应用综述[J/OL]. 计算机应用.
https://kns.cnki.net/kcms/detail/51.1307.TP.20210928.1611.014.html, 2021-09-29. |
[10] | Mikolov, T., Sutskever, I., Chen, K., et al. (2013) Distributed Representations of Words and Phrases and Their Compositionality. Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe, 5-10 December 2013. |
[11] | Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representations in Vector Space.
arXiv:1301.3781 [cs.CL] |
[12] | 秦胜君, 卢志平. 基于限制玻尔兹曼机的无极性标注情感分类研究[J]. 科学技术与工程, 2013, 13(35): 10703-10707. |
[13] | 梁军, 柴玉梅, 原慧斌, 昝红英, 刘铭. 基于深度学习的微博情感分析[J]. 中文信息学报, 2014, 28(5): 155-161. |