全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2016 

基于短文本情感分析的敏感信息识别
Sensitive Information Recognition Based on Short Text Sentiment Analysis

DOI: 10.7652/xjtuxb201609013

Keywords: 社交网络,情感分析,敏感信息
social networks
,sentiment analysis,sensitive information

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对现有的敏感信息识别是基于敏感关键词匹配方式判断的,准确度不是很高,且具有较高的误检率等问题,提出了敏感关键词与情感极性协同分析的敏感信息识别方法。在真实数据集上,利用监督学习的方式对微博的情感极性进行了度量,得到具体的情感极度,并将文本分为正情感极性和负情感极性两类。通过定义色情、暴力、违禁、邪教、反动等5大类2 639个敏感关键词和在数据集里面所呈现的Zipf分布特性,发现含有负情感极性的微博内容具有较高的敏感性,由此深入研究了敏感关键词对情感极性的动力因素,构建了含有情感极性因素的敏感度模型,提出了敏感信息的判别方法,敏感信息检测的准确率由传统方法的31.25%提高到了58.75%,召回率则由95%提升到96%,综合指标F值从47.0%提升到了72.3%。
The existing sensitive information recognition is based on the sensitive keyword matching method, so the accuracy is low and the miss rate is high. We presented a collaborative method by using the sensitive keywords and sentiment polarities to identify the sensitive information. In the real dataset, we used the supervised way to measure the sentiment polarities of the blogs, and divided the blogs into two categories, namely the blogs are with positive or negative sentiment polarities. Five kinds of 2 639 sensitive keywords, including pornography, violence, illegality, cult and reactionary, were defined, and it was found that according to the Zipf distribution of these words in the dataset, the contents of blogs with negative sentiment polarities exhibited high sensitivities. Then we studied the contribution of the sensitive keywords to the sentiment polarity, and constructed the model of sensitivity degree that contains the sentiment polarity factor. Based on this, we proposed a new way to identify the sensitive information, which makes the accuracy and miss rate improved from 31.25% to 58.75% and from 95% to 96%, respectively, and the F??measure was improved from 47.0%to 72.3%

References

[1]  [11]WANG P, XU B, XU J, et al. Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification [J]. Neurocomputing, 2015, 174: 806??814.
[2]  [13]WANG X, ZHU F, JIANG J, et al. Real time event detection in Twitter [J]. Lecture Notes in Computer Science, 2013, 7923: 502??513.
[3]  [1]WU K, ZHANG B, ZHENG J, et al. Sentiment classification for topical Chinese microblog based on sentences’ relations [C]∥The IEEE International Conference on Cyber, Physical and Social Computing. Piscataway, NJ, USA: IEEE, 2013: 2221??2225.
[4]  [6]CAO J, ZENG K, WANG H, et al. Web??based traffic sentiment analysis: methods and applications [J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 15(2): 844??853.
[5]  [2]ZHAO J, DONG L, WU J, et al. Moodlens: an emoticon??based sentiment analysis system for Chinese Tweets [C]∥Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2012: 1528??1531.
[6]  [3]LIU B. Sentiment analysis and opinion mining [J]. Synthesis Lectures on Human Language Technologies, 2012, 5(1): 1??167.
[7]  [4]WANG G, SUN J, MA J, et al. Sentiment classification: the contribution of ensemble learning [J]. Decision Support Systems, 2014, 57(1): 77??93.
[8]  [5]张鲁民, 贾焰, 周斌. 一种基于情感符号的在线突发事件检测方法 [J]. 计算机学报, 2013, 36(8): 1659??1667.
[9]  [8]SANTOS C, GATTIT M. Deep convolutional neural networks for sentiment analysis of short texts [C]∥Proceedings of the 25th International Conference on Computational Linguistics. Dublin, Ireland: Coling, 2014: 69??72.
[10]  [9]LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification [C]∥Proceedings of the 29th AAAI Conference on Artificial Intelligence. Menlo Park, California, USA: AAAI, 2015: 2267??2273.
[11]  [10]ZHOU C, SUN C, LIU Z, et al. A C??LSTM neural network for text classification [EB/OL]. [2016??02??20]. http: ∥arxiv. org/abs/1511??08630.
[12]  [12]ZHANG X, ZHAO J, LECUN Y. Character??level convolutional networks for text classification [EB/OL]. [2016??02??20]. http: ∥arxiv. org/abs/1509?? 01626.
[13]  [14]WENG J S, YAO Y X, LEONARDI E, et al. Event detection in Twitter [C]∥Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. Menlo Park, California, USA: AAAI, 2011: 401??408.
[14]  [15]ZHOU Donghao, HAN Wenbao. Diffrank: a novel algorithm for information diffusion detection in social networks [J]. Chinese Journal of Computer, 2014, 37(4): 884??892.
[15]  ZHANG Lumin, JIA Yan, ZHOU Bin. Online bursty events detection based on emoticons [J]. Chinese Journal of Computers, 2013, 36(8): 1659??1667.
[16]  [7]SOCHER R, PERELYGIN A, WU J, et al. Recursive deep models for semantic compositionality over a sentiment treebank [C]∥Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA: Association for Computational Linguistics, 2013: 1631??1642.
[17]  [16]BOLLEN J, PEPE A, MAO H. Modeling public mood and emotion: Twitter sentiment and socio??economic phenomena [C]∥Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media(ICWSM 2011). Menlo Park, California, USA: AAAI, 2011: 450??453.
[18]  [17]CORTES C, VAPNIK V. Support??vector networks [J]. Machine Learning, 1995, 20(3): 273??297.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133