|
计算机应用研究 2005
Study on New Pretreatment Method for Chinese Text Classification System
|
Abstract:
Presents a new text pretreatment method that applying programme flows control to eliminate the single Chinese word, pure English words, number and Chinese words containing English letter or maths symbol from the original text vector. Consequently the features that represent the text turn into the pure Chinese term. As a result, not only dimension of original text vector is deduced greatly but the information contents of text vector are improved enormously.