%0 Journal Article %T 基于词亲和度的微博词语语义倾向识别算法<br>Semantic Orientation Identification for Terms From Chinese Micro-blogs Based on Word Affinity Measure %A 唐浩浩 %A 王波 %A 周杰 %A 陈东 %A 刘绍毓 %J 数据采集与处理 %D 2015 %R 10.16337/j.1004-9037.2015.01.013 %X 准确识别词语语义倾向并构建高质量的情感词典,从而提高微博文本情感分析的准确率,具有重要意义。传统的基于语料库方法对种子词选取敏感,并且不能有 效对低频词语语义倾向进行识别。本文提出了一种基于词亲和度的微博词语语义倾向识别算法。利用词性组合模式提取候选词集,选取微博表情符号作为种子词, 并构建词亲和度网络,利用同义词词林对低频词进行扩展,计算候选词与种子词之间语义倾向相似度。根据设定阈值判断词语语义倾向。在200万条微博语料上分别将本文算法与传统算法进行对比,实验结果表明本文算法优于传统算法。<br>How to identify the semantic orientation of terms and build a high-quality sentiment dictionary to improve the accuracy of sentiment analysis on Micro-blogs has significant importance. Traditional algorithms based on corpus are sensitive to the seed words, and cannot effectively identify semantic orientation identification on low-frequency terms. To solve this problem, an algorithm based on word affinity measure is proposed to identify the semantic orientation of terms from Chinese Micro-blogs. Firstly, candidate words are extracted by the part of speech combination patterns. Secondly, Micro-blog emoticons are selected as seed words, and word affinity networks are built. Then, low frequency words are expanded by a synonyms dictionary during calculating the semantic orientation similarity between candidate words and seed words. Finally, the semantic orientation is determined according to the threshold. Experiments are conducted on a corpus with two million Micro-blogs using the proposed algorithm and traditional algorithms respectively. Experimental results show the advantage of the proposed algorithm. %K 微博 %K 情感词 %K 情感分析 %K 语义倾向 %K 词亲和度< %K br> %K Micro-blog %K opinioned terms %K sentiment analysis %K semantic orientation %K word affinity measure %U http://sjcj.nuaa.edu.cn/ch/reader/view_abstract.aspx?file_no=20150113&flag=1