全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Content Oriented Automatic Text Categorization

Keywords: The aim of this paper is to propose deep parallelism may be established between and an Automatic

Full-Text   Cite this paper   Add to My Lib

Abstract:

The project is to implement a web spam classifier, which given a web page, will analyze its features and try to determine whether the page is spam or not. The efficiency of the classifier will be compared to the results spam detection in the text datasets using Na ve Baye’s classifier text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different importance in a text. The term weighting methods assign appropriate weights to the terms to improve the performance of text ategorization. In this study, the investigate several widely-used unsupervised (traditional) and supervised term weighting methods on benchmark data collections in combination with NLP and Clustering algorithms. In consideration of the distribution of relevant documents in the collection, the propose a new simple supervised term weighting method, i.e. tf.rf, to improve the terms' discriminatingpower for text categorization task. a consistently better performance while other supervised term weighting methods based on information theory or statistical metric perform the worst in all experiments. On the other hand, the popularly used tf.idf method has not shown a uniformly good performance in terms of different data sets

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133