|
计算机系统应用 2010
A C4.5 Decision Tree Based Algorithm for Web Pages Categorization
|
Abstract:
Web text categorization can be applied to many domains such as information retrieval, news categorization, etc. Decision tree algorithm is a simple method for categorization and has been used extensively. This paper investigates the basic method and process to build a web classifier by means of C4.5 decision tree, which has various merits such as high categorization precision, high categorization speed, etc. Moreover, this paper proposes a C4.5 decision tree based frame of web pages classifier, and implements it on a web crawler. The experimental results show that this algorithm is highly effective.