OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

自动化学报 2012

基于上下文重构的短文本情感极性判别研究

DOI: 10.3724/SP.J.1004.2012.00055, PP. 55-67

杨震, 赖英旭, 段立娟, 李玉

Keywords: 舆情分析,短文本处理,情感计算,误差分析,遗传算法

Full-Text Cite this paper Add to My Lib

Abstract:

？文本对象所固有的多义性,面对短文本特征稀疏和上下文缺失的情况,现有处理方法无法明辨语义,形成了底层特征和高层表达之间巨大的语义鸿沟.本文尝试借由时间、空间、联系等要素挖掘文本间隐含的关联关系,重构文本上下文范畴,提升情感极性分类性能.具体做法对应一个两阶段处理过程:1)基于短文本的内在联系将其初步重组成上下文(领域);2)将待处理短文本归入适合的上下文(领域)进行深入处理.首先给出了基于NaiveBayes分类器的短文本情感极性分类基本框架,揭示出上下文(领域)范畴差异对分类性能的影响.接下来讨论了基于领域归属划分的文本情感极性分类增强方法,并将领域的概念扩展为上下文关系,提出了基于特殊上下文关系的文本情感极性判别方法.同时为了解决由于信息缺失所造成的上下文重组困难,给出基于遗传算法的任意上下文重组方案.理论分析表明,满足限制条件的前提下,基于上下文重构的情感极性判别方法能够同时降低抽样误差(Sampleerror)和近似误差(Approximationerror).真实数据集上的实验结果也验证了理论分析的结论.

References

[1]	Picard R W. Affective Computing. Cambridge:MIT Press,1997
[2]	Finn A,Kushmerick N. Learning to classify documents according to genre:special topic section on computational analysis of style. Journal of the American Society for Information Science and Technology,2006,57(11):1506-1518
[3]	Pang B,Lee L. Seeing stars:exploiting class relationships for sentiment categorization with respect to rating scales. In:Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Ann Arbor,USA:ACL,2005. 115-124
[4]	Liao Xiang-Wen,Cao Dong-Lin,Fang Bin-Xing,Xu Hong-Bo,Cheng Xue-Qi. Research on blog opinion retrieval based on probabilistic inference model. Journal of Computer Research and Development,2009,46(9):1530-1536(廖祥文,曹冬林,方滨兴,许洪波,程学旗. 基于概率推理模型的博客倾向性检索研究. 计算机研究与发展,2009,46(9):1530-1536)
[5]	Zhu Yan-Lan,Min Jin,Zhou Ya-Qian,Huang Xuan-Jing,Wu Li-De. Semantic orientation computing based on HowNet. Journal of Chinese Information Processing,2006,20(1):14-20(朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德. 基于HowNet的词汇语义倾向计算. 中文信息学报,2006,20(1):14-20)
[6]	Zeimpekis D,Gallopoulos E. Linear and non-linear dimensional reduction via class representatives for text classification. In:Proceedings of the 6th IEEE International Conference on Data Mining. Hong Kong,China:IEEE,2006. 1172-1177
[7]	Xu W R,Liu D X,Guo J,Cai Y C,Hu R L. Supervised dual-PLSA for personalized SMS filtering. In:Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology. Sapporo,Japan:Springer-Verlag,2009. 254-264
[8]	Wang L,Jia Y,Han W. Instant message clustering based on extended vector space model. In:Proceedings of the 2nd International Symposium on Advances in Computation and Intelligence. Wuhan,China:Springer-Verlag,2007. 435-443
[9]	Fan X,Hu H. A new model for Chinese short-text classification considering feature extension. In:Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence. Sanya,China:IEEE,2010. 7-11
[10]	O'Shea J,Bandar Z,Crockett K,McLean D. A comparative study of two short text semantic similarity measures. In:Proceedings of the 2nd KES International Symposium on Agent and Multi-agent Systems:Technologies and Applications. Incheon,Korea:Springer-Verlag,2008. 172-181
[11]	Tan S B,Cheng X Q,Wang Y F,Xu H B. Adapting naive Bayes to domain adaptation for sentiment analysis. In:Proceedings of the 31st European Conference on IR Research on Advances in Information Retrieval. Toulouse,France:Springer-Verlag,2009. 337-349
[12]	Breiman L. Bias,Variance and Arcing Classifiers,Technical Report 460,Department of Statistics,University of California at Berkeley,USA,1996
[13]	Kohavi R,Wolpert D H. Bias plus variance decomposition for zero-one loss functions. In:Proceedings of the 13th International Conference on Machine Learning. Bari,Italy:Morgan Kaufmann,1996. 275-283
[14]	Domingos P. A unified bias-variance decomposition for zero-one and squared loss. In:Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence. Austin,USA:AAAI,2000. 564-569
[15]	Cucker F,Smale S. On the mathematical foundations of learning. Bulletin of the American Mathematical Society,2002,39(1):1-49
[16]	Vapnik V. The Nature of Statistical Learning Theory. New York:Springer-Verlag,1995
[17]	Zeimpekis D,Gallopoulos E. TMG:A Matlab toolbox for generating term-document matrices from text collections. Grouping Multidimensional Data:Recent Advances in Clustering. Berlin:Springer-Verlag,2006. 187-210
[18]	Knerr S,Personnaz L,Dreyfus G. Single-layer learning revisited:a stepwise procedure for building and training a neural network. Neurocomputing:Algorithms,Architectures and Applications. Berlin:Springer-Verlag,1990. 41-50
[19]	Wayne C. Multilingual topic detection and tracking:successful research enabled by corpora and evaluation. In:Proceedings of the 2nd International Conference on Language Resources and Evaluation. Athens,Greece:ELRA,2000. 1487-1494
[20]	Kennedy A,Inkpen D. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence,2006,22(2):110-125
[21]	Osman D J,Yearwood J L. Opinion search in web logs. In:Proceedings of the 18th Conference on Australasian Database. Ballarat,Australia:ACS,2007. 133-139
[22]	Zhou Li-Zhu,He Yu-Kai,Wang Jian-Yong. Survey on research of sentiment analysis. Journal of Computer Applications,2008,28(1):2725-2728(周立柱,贺宇凯,王建勇. 情感分析研究综述. 计算机应用,2008,28(1):2725-2728)
[23]	Gong Cai-Chun. Research on Short Text Language Computing [Ph.D. dissertation],Institute of Computing Technology,Chinese Academy of Sciences,China,2008(龚才春. 短文本语言计算的关键技术研究 [博士学位论文],中国科学院研究生院(计算技术研究所),中国,2008)
[24]	Park H,Jeon M,Rosen J B. Lower dimensional representation of text data based on centroids and least squares. Bit Numerical Mathematics,2003,43(2):427-448
[25]	Morch A I,Cheung W,Wong K,Liu J,Lee C,Lam M,et al. Grounding collaborative knowledge building in semantics-based critiquing. In:Proceedings of the 4th International Conference on Advances in Web-based Learning. Hong Kong,China:Springer-Verlag,2005. 244-255
[26]	Adams P H,Martell C H. Topic detection and extraction in chat. In:Proceedings of the IEEE International Conference on Semantic Computing. Santa Clara,USA:IEEE,2008. 581-588
[27]	Yan Rui,Cao Xian-Bin,Li Kai. Dynamic assembly classification algorithm for short text. Acta Electronica Sinica,2009,37(5):1019-1024(闫瑞,曹先彬,李凯. 面向短文本的动态组合分类算法. 电子学报,2009,37(5):1019-1024)
[28]	Yang Feng,Peng Qin-Ke,Xu Tao. Sentiment classification for online comments based on random network theory. Acta Automatica Sinica,2010,36(6):837-844(杨锋,彭勤科,徐涛. 基于随机网络的在线评论情绪倾向性分类. 自动化学报,2010,36(6):837-844)
[29]	Kong E B,Dietterich T G. Error-correcting output coding corrects bias and variance. In:Proceedings of the 12th International Conference on Machine Learning. Tahoe City,USA:Morgan Kaufmann,1995. 313-321
[30]	Tibshirani R. Bias,Variance and Prediction Error for Classification rules,Technical Report No. 9602,Department of Statistics,University of Toronto,Canada,1996
[31]	Friedman J H. On bias,variance,0/1-loss,and the curse-of-dimensionality. Data Mining and Knowledge Discovery,1997,1(1):55-77
[32]	Zhou Z H,Jiang Y. NeC4.5:neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering,2006,16(6):770-773
[33]	Niyogi P. The Informational Complexity of Learning:Perspectives on Neural Networks and Generative Grammar. Norwell:Kluwer Academic Publishers,1997
[34]	Poggio T,Rifkin R,Mukherjee S and Niyogi P. General conditions for predictivity in learning theory. Nature,2004,428:419-422
[35]	Chang C C,Lin C J. LIBSVM:A library for support vector machines [Online],available:http://www.csie.ntu.edu.tw/ ～cjlin/libsvm/,Nov 20,2011

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133