%0 Journal Article %T A C4.5 Decision Tree Based Algorithm for Web Pages Categorization
一种基于C4.5决策树的Web页面分类算法 %A CAO Wei %A ZHANG Nai-Zhou %A
曹薇 %A 张乃洲 %J 计算机系统应用 %D 2010 %I %X Web text categorization can be applied to many domains such as information retrieval, news categorization, etc. Decision tree algorithm is a simple method for categorization and has been used extensively. This paper investigates the basic method and process to build a web classifier by means of C4.5 decision tree, which has various merits such as high categorization precision, high categorization speed, etc. Moreover, this paper proposes a C4.5 decision tree based frame of web pages classifier, and implements it on a web crawler. The experimental results show that this algorithm is highly effective. %K web text categorization %K C4 %K 5 decision tree %K information theory %K information gain ratio %K web crawler
WEB文本分类 %K C4.5决策树 %K 信息论 %K 信息增益率 %K 网络爬虫 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=D4F6864C950C88FFCE5B6C948A639E39&aid=742992AAE095A53E61996C94ABB34BEF&yid=140ECF96957D60B2&vid=2A8D03AD8076A2E3&iid=F3090AE9B60B7ED1&sid=64963996248CBF47&eid=FEF02B4635FE8227&journal_id=1003-3254&journal_name=计算机系统应用&referenced_num=0&reference_num=8