%0 Journal Article
%T A C4.5 Decision Tree Based Algorithm for Web Pages Categorization
一种基于C4.5决策树的Web页面分类算法
%A CAO Wei
%A ZHANG Nai-Zhou
%A
曹薇
%A 张乃洲
%J 计算机系统应用
%D 2010
%I
%X Web text categorization can be applied to many domains such as information retrieval, news categorization, etc. Decision tree algorithm is a simple method for categorization and has been used extensively. This paper investigates the basic method and process to build a web classifier by means of C4.5 decision tree, which has various merits such as high categorization precision, high categorization speed, etc. Moreover, this paper proposes a C4.5 decision tree based frame of web pages classifier, and implements it on a web crawler. The experimental results show that this algorithm is highly effective.
%K web text categorization
%K C4
%K 5 decision tree
%K information theory
%K information gain ratio
%K web crawler
WEB文本分类
%K C4.5决策树
%K 信息论
%K 信息增益率
%K 网络爬虫
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=D4F6864C950C88FFCE5B6C948A639E39&aid=742992AAE095A53E61996C94ABB34BEF&yid=140ECF96957D60B2&vid=2A8D03AD8076A2E3&iid=F3090AE9B60B7ED1&sid=64963996248CBF47&eid=FEF02B4635FE8227&journal_id=1003-3254&journal_name=计算机系统应用&referenced_num=0&reference_num=8