OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机应用研究 2012

Improved TFIDF feature extraction algorithm based on semantic association and information gain
基于语义关联和信息增益的TFIDF改进算法研究*

XU Ke,MENG Zu-qiang,LIN Qi-feng,
许珂,蒙祖强,林啓峰

Keywords: TFIDF,feature extraction,semantic association,information gain,text classification
词频反文档频率,特征提取,语义关联,信息增益,文本分类

Full-Text Cite this paper Add to My Lib

Abstract:

Both the traditional and improved term frequency-inverse document frequency (TFIDF) algorithms ignored the difference of distributions among different categories in feature extraction. Due to the lacking of consideration of semantic relationships within some certain categories, the selected feature word cannot describe the contents of the document correctly and accurately. In order to select feature more accurately, in this paper, based on the previous improvements, introduced the semantic association of words to analyze the semantic of text, redesigned the weights equation, and proposed the new TFIDF algorithm combined with semantic and information gain. The developed algorithm can make up for the shortcomings of the lack of semantic information in statistical method. Experimental results illustrate that the improved algorithm can effectively improve text classification accuracy.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

Improved TFIDF feature extraction algorithm based on semantic association and information gain基于语义关联和信息增益的TFIDF改进算法研究*

Improved TFIDF feature extraction algorithm based on semantic association and information gain
基于语义关联和信息增益的TFIDF改进算法研究*