|
计算机应用 2006
New text categorization method based on the frequency of topic words
|
Abstract:
The word frequency matrix currently used in text categorization is characterized with high dimensionality and excessive sparsity.These two features caused some difficulties to computing.To solve this problem,according to the search engine users' selections,a new text categorization method based upon the feature of topic words frequency was proposed.This approach was designed to filter new concept topic words by statistical method,and then the FCM clustering algorism was applied to the documents,using the frequency of topic words rather than the frequency of single word as the feature.This method performs well in the experiment.Furthermore,this method was compared in many aspects with a text categorization method based on clusters,and some useful conclusions about implementation and application were reached.