全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

文本分类中特征权重算法的改进

, PP. 95-98

Keywords: 文本分类,特征权重,TFIDF,类别区分,BOR-TFIDF

Full-Text   Cite this paper   Add to My Lib

Abstract:

TFIDF是文档特征权重表示常用方法.该方法简单易行,但忽略了特征词在各个类别中的分布情况,不能真正地反映特征词对区分每个类的贡献.针对这个不足,本文提出了BOR-TFIDF,来重新调整每个特征词对各个类别的区分度,即修正各个特征词的权重,并用分类器来验证其有效性.该方法优于原来的TFIDF算法,实验表明了改进的策略是可行的.

References

[1]  [ 张玉芳, 彭时名, 吕佳. 基于文本分类TFIDF方法的改进与应用[ J]. 计算机工程, 2006, 32( 19) : 76-78.
[2]  Zhang Yufang, Peng Sh im ing, Lü Jia. Improvem ent and application o fTFIDF m ethod based on tex t classification[ J]. Computer Eng ineering, 2006, 32( 19): 76-78. ( in Chinese)
[3]  [ Sebastiani F. M ach ine learn ing in au tom ated tex t ca tego rization[ J]. ACM Computing Surveys, 2002, 34( 1): 1-47.
[4]  [ Lew is D D, Na?ve Bayes. The independence assum ption in in fo rm ation re trieval[ C ] / / The 10 th European Con f onM achine
[5]  Learning. N ew York: Springer-Verlag, 1998.
[6]  [ Y im ingY ang, X in L iu. A re-ex am ination o f text ca tego rization m e thods[ C ] / / S IGIR’ 99. New York: ACM Press, 1999: 42-49.
[7]  [ Yang Y, Chute C G. An exam ple-based mapp ingm e thod for tex t categor ization and re trieval[ J]. ACM T rans on Inform ation System s, 1994, 12( 3): 252-277.
[8]  [ H an E H, Karyp is G. Centro id-based docum ent c lassifica tion: analysis and experim enta l results[ C] / / Proc of PKDD’ 00. London: Springer-Ver lag, 2000: 424-431.
[9]  [ Schapire R E, SingerY. Im proved boosting algorithm s using confidence-rated pred ica tions[ C ] / / Proc of the 11 th Annual Conf on Computational Learn ing Theory. M adison: ACM Press, 1998: 80-91.
[10]  [ Joach im s T. Tex t categor ization w ith support vecto rm ach ines: learn ing w ith m any re levant featu res[ C ] / / The 10th European Confon Machine Learn ing. B erlin: Spr ing er, 1998: 137-142.
[11]  [ 徐凤亚, 罗振声. 文本自动分类中特征权重算法的改进研究[ J]. 计算机工程与应用, 2005( 1): 181-184.
[12]  Xu Fengya, Luo Zhensheng. An improved approach to term we ighting in autom ated tex t classification[ J]. Com puter Eng ineering and App lica tions, 2005( 1): 181-184. ( in Ch inese)
[13]  [ 张云涛, 龚玲, 王永成. 文本分类中TFIDF方法的改进[ J]. 浙江大学学报, 2005, 6A( 1): 49-55.
[14]  Zhang Yuntao, Gong Ling, W ang Yong cheng. An im proved TF- IDF approach for text class ification[ J]. Journal of Zhe jiang University, 2005, 6A( 1): 49-55. ( in Ch inese)
[15]  [ 寇莎莎, 魏振军. 自动文本分类中权值公式的改进[ J]. 计算机工程与设计, 2005, 26( 6): 1 616-1 618.
[16]  Kou Shasha, W e i Zhenjun. Im proved w eigh ting fo rmu la in auto tex t c lassifica tion[ J]. Computer Eng ineer ing and Des ign,2005, 26( 6): 1 616-1 618. ( in Ch inese)
[17]  [ 李荣陆. 文本分类系统[ DB /OL]. http: / /www. nlp. org. cn /docs/dow nload. php? doc- id= 102. 2004- 08- 19.
[18]  L iRong lu. Tex t c lassica tion system [ DB /OL ]. Data Se t, http: / /www. nlp. org. cn /docs/download. php? doc- id= 102.2004- 08- 19. ( in Chinese)
[19]  [ Dav id D, Lew is. Reuters- 21578, Test Co llections[ R /OL] . h ttp: / /www. dav iddlew is. com / resources/ testco llections/ reuters21578/. 1996.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133