全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

强化类别贡献的文本特征权重方案

Keywords: 文本表示,文本分类,相关频率,类别贡献度,支持向量机

Full-Text   Cite this paper   Add to My Lib

Abstract:

为使文本向量能准确表达文本信息、提升文本分类效果,提出了一种强化类别贡献的文本特征权重方案.利用后验概率定义了特征词的类别贡献度函数,结合相关频率权重因子,得到兼顾类别贡献度与类间分布差异的文本特征权重量化方案.在4个标准语料集上的测试结果表明,该方案实现简单,能更准确地刻画不同特征对分类的贡献差异,优化文本表示,并显著地提高文本分类效果.

References

[1]  FAN Rong-en,CHANG Kai-wei,HSIEH Cho-jui,et al.Liblinear:a library for large linear classification[J].Journal of Machine Learning Research,2008,9:1871-1874.
[2]  LEOPOLD E,KINDERMANN J.Text categorization withsupport vector machines.how to represent texts in inputspace?[J].Machine Learning,2002,46:423-444.
[3]  ALTINAY H,ERENEL Z.Analytical evaluation of termweighting schemes for text categorization[J].PatternRecognition Letters,2010,31(11):1310-1323.
[4]  SALTON G,BUCKLEY C.Term-weighting approaches inautomatic text retrieval[J].Information Processing andManagement,1988,24(5):513-523.
[5]  LIU Y,LOH H T,SUN A.Imbalanced text classification:a term weighting approach[J].Expert System WithApplications,2009,36:690-701.
[6]  JOACHIMS T.Learning to classify text using supportvector machines:methods,theory and algorithms[M].Norwell:Kluwer Academic Publishers,2002:7-34.
[7]  SEBASTIANI F.Machine learning in automated textcategorization[J].ACM Computing Surveys,2002,34(1):1-47.
[8]  LEWIS D D.Reuters-21578 text categorization collection[DB/OL].[2011-04-13].http:∥kdd.ics.uci.edu/databases/reuters21578/.
[9]  LANG K.News weeder:learning to filter netnews[C]∥Proc 12th Int’l Conf Machine Learning(ICML’95).Tahoe City:Morgan Kaufmann,1995:331-339.
[10]  GRAVEN M,DIPASQUO D,FREITAG D,et al.Learning to extract symbolic knowledge form the worldwide Web[C]∥Proc 15th Nat’l Conf for ArtificialIntelligence.Menlo Park:AAAI Press,1998:509-516.
[11]  HERSH W,BUCKLEY C,LEONE T J,et al.OHSUMED:an interactive retrieval evaluation and newlarge test collection for research[C]∥Proceedings of the17th Annual ACM SIGIR Conference.Dublin:ACM/Springer,1994:192-201.
[12]  PORTER M.An algorithm for suffix stripping[J].Program,1980,14(3):130-137.
[13]  LAN M,TAN C L,SU J,et al.Supervised and traditionalterm weighting methods for automatic text categorization[J].IEEE Transactions on Pattern Analysis and MachineIntelligence,2009,31(4):721-735.
[14]  XUE Xiao-bing,ZHOU Zhi-hua.Distributional features fortext categorization[J].IEEE Transactions on Knowledgeand Data Engineering,2009,21(3):428-442.
[15]  苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.SU Jin-shu,ZHANG Bo-feng,XU Xin.Advances inmachine learning based text categorization[J].Journal ofSoftware,2006,17(9):1848-1859.(in Chinese)
[16]  QI X G,DAVISON B D.Web page classification:featuresand algorithms[J].ACM Computing Surveys,2009,41(2):1-31.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133