全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Improved TFIDF feature extraction algorithm based on semantic association and information gain
基于语义关联和信息增益的TFIDF改进算法研究*

Keywords: TFIDF,feature extraction,semantic association,information gain,text classification
词频反文档频率
,特征提取,语义关联,信息增益,文本分类

Full-Text   Cite this paper   Add to My Lib

Abstract:

Both the traditional and improved term frequency-inverse document frequency (TFIDF) algorithms ignored the difference of distributions among different categories in feature extraction. Due to the lacking of consideration of semantic relationships within some certain categories, the selected feature word cannot describe the contents of the document correctly and accurately. In order to select feature more accurately, in this paper, based on the previous improvements, introduced the semantic association of words to analyze the semantic of text, redesigned the weights equation, and proposed the new TFIDF algorithm combined with semantic and information gain. The developed algorithm can make up for the shortcomings of the lack of semantic information in statistical method. Experimental results illustrate that the improved algorithm can effectively improve text classification accuracy.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133