%0 Journal Article %T Improved TFIDF feature extraction algorithm based on semantic association and information gain
基于语义关联和信息增益的TFIDF改进算法研究* %A XU Ke %A MENG Zu-qiang %A LIN Qi-feng %A
许珂 %A 蒙祖强 %A 林啓峰 %J 计算机应用研究 %D 2012 %I %X Both the traditional and improved term frequency-inverse document frequency (TFIDF) algorithms ignored the difference of distributions among different categories in feature extraction. Due to the lacking of consideration of semantic relationships within some certain categories, the selected feature word cannot describe the contents of the document correctly and accurately. In order to select feature more accurately, in this paper, based on the previous improvements, introduced the semantic association of words to analyze the semantic of text, redesigned the weights equation, and proposed the new TFIDF algorithm combined with semantic and information gain. The developed algorithm can make up for the shortcomings of the lack of semantic information in statistical method. Experimental results illustrate that the improved algorithm can effectively improve text classification accuracy. %K TFIDF %K feature extraction %K semantic association %K information gain %K text classification
词频反文档频率 %K 特征提取 %K 语义关联 %K 信息增益 %K 文本分类 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=C7A4D0D008E83819C9555652C6140FC6&yid=99E9153A83D4CB11&vid=771469D9D58C34FF&iid=0B39A22176CE99FB&sid=2E15A588990CC690&eid=A3F93694B058F76C&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=0&reference_num=14