全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

Research on Chinese Text Feature Extraction and Sentiment Analysis Based on Combination Network

DOI: 10.4236/oalib.1106905, PP. 1-12

Subject Areas: Linguistics

Keywords: Feature Selection, Word Vector Representation, Neural Network, CNN-BiLSTM-Attention, Sentiment Analysis

Full-Text   Cite this paper   Add to My Lib

Abstract

The complexity of Chinese language system brings great challenge to sentiment analysis. Traditional artificial feature selection is easy to cause the problem of inaccurate segmentation semantics. High quality preprocessing results are of great significance to the subsequent network model learning. In order to effectively extract key features of sentences, retain feature words while removing irrelevant noise and reducing vector dimensions, an algorithm module based on sentiment lexicon combined with Word2vec incremental training is proposed in terms of feature engineering. Firstly, the data set is cleaned, and the sentence is segmented by loading a custom sentiment lexicon with Jieba. Secondly, the results after stopping words are obtained through Skip-gram training algorithm to obtain the word vector model. Secondly, the model is added to a large corpus for incremental training to obtain a more accurate word vector model. Finally, the features are learned and classified by inputting the embedding layer into the neural network model. Through the comparison experiment of multiple models, it is found that the combined model (CNN-BiLSTM-Attention) has better classification effect and better application ability.

Cite this paper

Xu, H. and Yang, L. (2020). Research on Chinese Text Feature Extraction and Sentiment Analysis Based on Combination Network. Open Access Library Journal, 7, e6905. doi: http://dx.doi.org/10.4236/oalib.1106905.

References

[1]  China Internet Network Information Center (CNNIC) (2020) 45th Statistical Report on Internet Development in China.
[2]  Sonawane, S.L. and Kulkarni, P.V. (2017) Extracting Sentiments from Reviews: A Lexicon-Based Approach. 1st International Conference on Intelligent Systems and Information Management (ICISIM), Aurangabad, 5-6 October 2017, 38-43. https://doi.org/10.1109/ICISIM.2017.8122144
[3]  Taj, S., Shaikh, B.B. and Fatemah Meghji, A. (2019) Sentiment Analysis of News Articles: A Lexicon Based Approach. 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, 30-31 January 2019, 1-5. https://doi.org/10.1109/ICOMET.2019.8673428
[4]  Wongkar, M. and Angdresey, A. (2019) Sentiment Analysis Using Naive Bayes Algorithm of the Data Crawler: Twitter. 2019 4th International Conference on Informatics and Computing (ICIC), Semarang, 16-17 October 2019, 1-5. https://doi.org/10.1109/ICIC47613.2019.8985884
[5]  Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October 2014, 1746-1751. https://doi.org/10.3115/v1/D14-1181
[6]  Kalchbrenner, N., Grefenstette, E. and Blunsom, P. (2014) A Convolutional Neural Network for Modelling Sentences. Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, Vol. 1: Long Papers, 655-665. https://doi.org/10.3115/v1/P14-1062
[7]  Alayba, A.M., Palade, V., England, M., et al. (2018) A Combined CNN and LSTM Model for Arabic Sentiment Analysis. Lecture Notes in Computer Science, 11015, 179-191. https://doi.org/10.1007/978-3-319-99740-7_12
[8]  Ji, H., Rong, W., Liu, J., Ouyang, Y. and Xiong, Z. (2019) LSTM Based Semi-Supervised Attention Framework for Sentiment Analysis. 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/ IOP/SCI), Leicester, 19-23 August 2019, 1170-1177. https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00218
[9]  Mikolov, T., Sutskever, I., Chen, K., et al. (2013) Distributed Representations of Words and Phrases and Their Compositionality. Proc of the 26th International Conference on Neural Information Processing Systems, Curran Associates Inc., USA, 3111-3119.
[10]  Sogou Tech-Oriented News Laboratory Data. http://www.sogou.com/labs/resource/ca.php
[11]  Sentiment Analysis in Chinese Corpus. https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/online_shopping_10_cats

Full-Text


comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413