全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2016 

基于背景重构与边缘相关短文本特征选择方法 A short text feature selection method based on context reconstruction and marginal relevance

Keywords: 背景重构,期望交叉熵,边缘相关,特征选择

Full-Text   Cite this paper   Add to My Lib

Abstract:

为了解决短文本对象特征空间稀疏性与背景缺失造成的精确分类困难与语义混淆问题,提出一种背景补偿与边缘相关计算的特征选择方法.通过提取并利用文本间存在的关联性建立小样本簇背景特征集,重构特征空间,并结合边缘相关性分析确定最终的特征集.过程可分为2个阶段:1)基于词矢量语义量化模型计算特征词的背景相关性;2)将测试文本重组特征空间,并进行边缘性相关计算.提出的短文本特征选择方法,可以在保持原始特征空间性质与结构的前提下,强化特征空间紧凑性,减少冗余性,降低特征维度.在Reuters-21578和NewsGroup标准语料集上的实验证明,提出的方法比传统的文档频率、信息增益、互信息等方法更有效,针对两个标准的数据集,其在典型的分类器上运行表现强于一般特征选择方法

References

[1]  Pieter-Tjerk De Boer,Dirk P Kroese,Shie Mannor,et al.A tutorial on the cross-entropy method[J].Annals of Operations Research,2005,134(1):19-67.
[2]  Yang Lingpeng,Ji Donghong,Leong Munkew.Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval[J].Information Processing&Management,2007,43(2):315-326.
[3]  杨锋,彭鄞科,徐涛.基于随机网络的在线评论情绪倾向性分类[J].自动化学报,2010,36(6):837-844.Yang Feng,Peng Qinke,Xu Tao.Sentiment classification for online comments based on random network theory[J].Acta Automatica Sinica,2010,36(6):837-844.
[4]  李卫疆,赵铁军,王宪刚.基于上下文的查询扩展[J].计算机研究与发展,2010,47(2):300-304.Li Weijiang,Zhao Tiejun,Wang Xiangang.Contextsensitive query expansion[J].Journal of Computer Research and Development,2010,47(2):300-304.
[5]  Guo Shengbo,Scott Sanner.Probabilistic latent maximal marginal relevance[C]//Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval,ACM,New York,USA,2010:833-834.
[6]  Reuters 21578[EB/OL].http://www.daviddlewis.com/resources/testcollections/reuters21578/.
[7]  Ogura H,Amano H,Kondo M.Comparison of metrics for feature selection in imbalanced text classification[J].Expert System with Applications,2011,38(5):4979-4988.
[8]  袁满,欧阳元新,熊璋,等.一种基于频繁词集的短文本特征扩展方法[J].东南大学学报(自然科学版),2014,44(2):256-259.Yuan Man,Ouyang Yuanxin,Xiong Zhang,et al.Short text feature extension method based on frequent term sets[J].Journal of Southeast University(Natural Science Edition),2014,44(2):256-259.
[9]  Tang Jiliang,Wang Xufei,Gao Huiji,et al.Enriching short text representation in microblog for clustering[J].Frontiers of Computer Science,2012,6(1):88-101.
[10]  Li Y H,Dong M,Hua J.Localized feature selection for clustering[J].Pattern Recognition Letters,2008,29(1):11-17.
[11]  Cannas L M,Dessi N,Pes B.Assessing similarity of feature selection techniques in high-dimensional domains[J].Pattern Recognition Letters,2013,34(12):1446-1453.
[12]  Wang L,Jia Y,Han W.Instant message clustering based on extended vector space model[C]//Proceedings of the 2nd International Symposium an Advances in Computation and Intelligence,Wuhan,China.Springer-Verlag,2007:435-443.
[13]  Chen Mengen,Jin Xiaoming,Shen Dou.Short text classification improved by learning multi-granularity topics[C]//Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence,Barcelona,Spain,AAAI,2011:1776-1781.
[14]  Gong R,Huang S,Chen Tieming.Robust and efficient rule extraction through data summarization and its application in welding fault diagnosis[J].IEEE Transactions on Industrial Informatics,2008,4(3):199-205.
[15]  Yoonjung Choi,Youngho Kim,Sung-Hyon Myaeng.Domain-specific sentiment analysis using contextual feature generation[C]//Proceedings of the 1st International CIKM Workshop on Topic-sentiment Analysis for Mass Opinion,Hong Kong,China.ACM,New York,USA,2009:37-44.
[16]  NewsGroup[EB/OL].http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.htm l.
[17]  Mettu Srinivas,Pujari Supreethi K,Prasad E V,Anitha Kumari S.Efficient text classification using best feature selection and combination of methods[C]//Human-Computer Interaction Processing 2000,San Diego,CA,USA,Town and Country Resort&Convention Center,2009:438-445.
[18]  单松巍,冯是聪,李晓明.几种典型特征选取方法在中文网页分类上的效果比较[J].计算机工程与应用,2003,39(22):146-148.Shan Songwei,Feng Shicong,Li Xiaoming.A comparative study on several typical feature selection methods for Chinese Web page categorization[J].Computer Engineering and Applications,2003,39(22):146-148.
[19]  Shang Wenqian,Huang Houkuan,Zhu Haibin,et al.A novel feature selection algorithm for text categorization[J].Expert Systems with Applications,2007,33(1):1-5.
[20]  Adams P H,Martell C H.Topic detection and extraction in chat[C]//Proceedings of the IEEE International Conference on Semantic Computing,Santa Clara,USA,IEEE,2008:581-588.
[21]  Fan X,Hu H.A new model for Chinese short-text classification considering feature extension[C]//Proceedings of the International Conference on Artificial Intelligence and Computation Intelligence,Sanya,China,IEEE,2010:7-11.
[22]  Lee J,Lim H,Kim D W.Approximating mutual information for multi-label feature selection[J].Electronics Letters,2012,48(15):929-930.
[23]  Lee J,Kim D W.Feature selection for multi-label classification using multivariate mutual information[J].Pattern Recognition Letters,2013,34(3):349-357.
[24]  Sushmita Mitra,Partha Pratim Kundu,Witold Pedryca.Feature selection using structuralsimilarity[J].Information Sciences,2012,198:49-60.
[25]  Venkatesh Karthik S,Srikant R,Madhu R M.Feature selection&dominant feature selection for product reviews using meta-heuristic algorithms[C]//Proceedings of the 3rd Annual ACM Bangalore Conference.New York:ACM,2010.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133