%0 Journal Article
%T Associative Web Document Classification Based on Word Mixed Weight
基于特征词复合权重的关联网页分类
%A LAN Jun
%A SHI Hua-ji
%A LI Xing-yi
%A XU Min
%A
兰均
%A 施化吉
%A 李星毅
%A 徐敏
%J 计算机科学
%D 2011
%I
%X There are two shortages when the method of classification based on association rules is applied to classify the Web documents:one is that the method process the Web document as a plain text,ignoring the HTML tags information of the Web page;another is that either item of the association rules is only the words in the Web page,without considc ring the weight of the words, or it quantifies the weight of the word frequency, ignoring the importance of the location of the words in the Web document. Therefore, a new efficient method was proposed in the paper. It calculates the word's mixed weight by the information of the H TMI_ tags feature, and then mines the classification rules based on the mixed weight to classify the Web pages. The result of experiment shows that the performance of this approach is better than the traditional associated classification methods.
%K Web document classification
%K Association rules
%K Location feature
%K Mixed weight
网页分类,关联规则,位置特征,复合权重
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=B48648DF8F599BD56A784FC1A4873521&yid=9377ED8094509821&vid=16D8618C6164A3ED&iid=38B194292C032A66&sid=3E0812ED84A7B31D&eid=6235172E4DDBA109&journal_id=1002-137X&journal_name=计算机科学&referenced_num=0&reference_num=12