%0 Journal Article %T 基于网格聚类的情感分析研究 %A 缪裕青 %A 高韩 %A 刘同来 %A 文益民 %J 中国科学技术大学学报 %D 2016 %R 10.3969/j.issn.0253-2778.2016.10.012 %X 传统基于语义词典和基于机器学习的中文情感分析方法,其情感分析结果受人的主观因素影响较大,在一定程度上依赖于人工建立的词典,词典的可扩展性不强.本文对于不被包括在知网情感词典中但又含有一定情感倾向的词语,使用点互信息PMI算法、设置参数阈值等方法,进行自动识别、提取和分类,从而达到扩充词典的目的.在此基础上,建立商品评论的特征向量模型,提出情感分类算法SCG,通过网格聚类算法建立分类模型,在网格聚类过程中引入动态衰减因子,周期性地移除稀疏网格,减少计算量.实验结果表明,相比Naive Bayes,SMO(sequential minimal optimization)等分类算法,SCG算法具有更高的准确率和领域适应性.</br>Abstract:To expand a lexicon, the methods of point mutual information (PMI), setting the threshold parameter, etc. were used to automatically identify, extract and classification the words which are not included in the HowNet but have a certain emotional tendency. On that basis, a feature vector model based on commodity comments was established, and the SCG (sentiment classification based on grid clustering) algorithm was presented. Next, the grid-based clustering algorithm was used to build up a classification model. The amount of calculation decreased after the dynamic attenuation factors were introduced and sparse grids were periodically removed in the grid-based clustering process. Experimental results indicate that the classification accuracy and field adaptability of SCG is higher, compared with other algorithms such as Naive Bayes, SMO (sequential minimal optimization). %K 情感分析 %K 网格 %K 聚类 %K 点互信息 %K 分类< %K /br> %K Key words: sentiment analysis grid cluster point mutual information (PMI) classification %U http://just.ustc.edu.cn/CN/abstract/abstract236.shtml