|
重庆邮电大学学报(自然科学版) 2010
Feature selection combined ODF with discernible sets
|
Abstract:
In Chinese text categorization, the selection space of Chinese text categorization algorithm is restricted due to the high quantity of Chinese entries. Feature selection is the core research topic in text categorization. This paper firstly presents an optimal document frequency (ODF), and introduces rough sets and a new attributes reduction algorithm based on discernible sets. Finally, combining the attribute reduction algorithm with the ODF, the paper proposes a comprehensive feature selection method. The comprehensive method uses the ODF to filter out some terms and to reduce the sparsity of feature spaces, and then it employs the attribute reduction algorithm to eliminate redundancy for acquiring the feature subset that are more representative. The experimental results show that the combined method is excellent in accuracy rate and recall rate.