|
现代图书情报技术 2007
Study of Statistics-rule Based Hierarchical Web Page Classification
|
Abstract:
Statistics-based classification methods are common-used in hierarchical Web classification.However,classification precision of statistics-based methods often drops when categories are very similar to each other because of feature overlapping.Due to the nature of hierarchical Web classification,categories sharing the same parent(e.g.,leaf categories in the hierarchy) are often very similar to each other.To improve the classification precision,the paper proposes to use rule-based classification methods on top of statistics-based methods in hierarchical Web classification.Experiments show that our methods perform well on our education Web collections.