全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2018 

Parallel noise eliminate: A parallel noise elimination algorithm for massive text categorization

DOI: 10.1177/1748301818779047

Keywords: Massive data,text categorization,noise feature reduction,error feature,key feature selection,parallelization

Full-Text   Cite this paper   Add to My Lib

Abstract:

Noise data in text are one of the main factors affecting the quality of text categorization. A parallel noise data elimination algorithm based on principal component analysis method and term frequency-inverse document frequency method for the noise data issue of massive text categorization is proposed. Five types of noise data which may occur during text categorization process are analyzed and summarized in this paper. Before text categorization, a redundant noise elimination algorithm based on key feature selection is presented for redundant noise features. During the process of text categorization, the error noise detection algorithm is given for inaccurate noise features. The proposed method is compared with other four typical noise processing methods in different noise ratios on two common corpora. The results show that the proposed method is feasible and can maintain more stable and excellent classification performance and lower running time

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133