The complexity of Chinese language system brings great challenge to sentiment analysis. Traditional artificial feature selection is easy to cause the problem of inaccurate segmentation semantics. High quality preprocessing results are of great significance to the subsequent network model learning. In order to effectively extract key features of sentences, retain feature words while removing irrelevant noise and reducing vector dimensions, an algorithm module based on sentiment lexicon combined with Word2vec incremental training is proposed in terms of feature engineering. Firstly, the data set is cleaned, and the sentence is segmented by loading a custom sentiment lexicon with Jieba. Secondly, the results after stopping words are obtained through Skip-gram training algorithm to obtain the word vector model. Secondly, the model is added to a large corpus for incremental training to obtain a more accurate word vector model. Finally, the features are learned and classified by inputting the embedding layer into the neural network model. Through the comparison experiment of multiple models, it is found that the combined model (CNN-BiLSTM-Attention) has better classification effect and better application ability.
Cite this paper
Xu, H. and Yang, L. (2020). Research on Chinese Text Feature Extraction and Sentiment Analysis Based on Combination Network. Open Access Library Journal, 7, e6905. doi: http://dx.doi.org/10.4236/oalib.1106905.
Sonawane, S.L. and Kulkarni, P.V. (2017) Extracting Sentiments from Reviews: A Lexicon-Based Approach. 1st International Conference on Intelligent Systems and Information Management (ICISIM), Aurangabad, 5-6 October 2017, 38-43.
https://doi.org/10.1109/ICISIM.2017.8122144
Taj, S., Shaikh, B.B. and Fatemah Meghji, A. (2019) Sentiment Analysis of News Articles: A Lexicon Based Approach. 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, 30-31 January 2019, 1-5. https://doi.org/10.1109/ICOMET.2019.8673428
Wongkar, M. and Angdresey, A. (2019) Sentiment Analysis Using Naive Bayes Algorithm of the Data Crawler: Twitter. 2019 4th International Conference on Informatics and Computing (ICIC), Semarang, 16-17 October 2019, 1-5.
https://doi.org/10.1109/ICIC47613.2019.8985884
Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October 2014, 1746-1751.
https://doi.org/10.3115/v1/D14-1181
Kalchbrenner, N., Grefenstette, E. and Blunsom, P. (2014) A Convolutional Neural Network for Modelling Sentences. Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, Vol. 1: Long Papers, 655-665. https://doi.org/10.3115/v1/P14-1062
Alayba, A.M., Palade, V., England, M., et al. (2018) A Combined CNN and LSTM Model for Arabic Sentiment Analysis. Lecture Notes in Computer Science, 11015, 179-191. https://doi.org/10.1007/978-3-319-99740-7_12
Ji, H., Rong, W., Liu, J., Ouyang, Y. and Xiong, Z. (2019) LSTM Based Semi-Supervised Attention Framework for Sentiment Analysis. 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/ IOP/SCI), Leicester, 19-23 August 2019, 1170-1177.
https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00218
Mikolov, T., Sutskever, I., Chen, K., et al. (2013) Distributed Representations of Words and Phrases and Their Compositionality. Proc of the 26th International Conference on Neural Information Processing Systems, Curran Associates Inc., USA, 3111-3119.