全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

基于结构和文本特征的网页分类技术研究

DOI: 10.3969/j.issn.0253-2778.2017.04.002

Keywords: 网页分类, 朴素贝叶斯, 原子特征, 联合特征
Key words: web page classification nave Bayes atomic feature joint feature

Full-Text   Cite this paper   Add to My Lib

Abstract:

Web网页中含有丰富的信息资源,通过网页分类可以更好地对其内容进行抽取和管理,方便用户阅读.针对网页复杂的结构信息和丰富的文本内容,提出了一种基于网页文本和结构的网页分类方法,利用众创相关网页的结构特点和文本信息,选择联合特征和原子特征相结合的方法进行分类.实验表明,这种方法有一定的可行性,且比单一使用文本信息进行分类的方法具有更高的正确率和召回率.
Abstract:Since web pages contain abundant information resources, a better extraction and management of the information can be achieved through web page categorization. Considering the complex structure and abundant text information, a method was proposed for web page categorization based on the structure and text. The method of combining joint features and atomic features was employed to classify the web pages. The experiment result shows that the proposed method is feasible to some extent and has a higher precision and recall rate than using text information only.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133