全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

An Algorithm for Noise Reduction in Web Pages Based on a Group of Content-related Rules
一种基于内容规则的网页去噪算法*

Keywords: Noise reduction in Web pages Levenshtein distance
网页净化
,编辑距离

Full-Text   Cite this paper   Add to My Lib

Abstract:

This paper presents a new algorithm for the elimination of noise in Web pages based on a group of content-related rules.First,the authors present an algorithm which can peel off noises by iteratively comparing the tables on the same level of the page's table tree.Next,an algorithm is presented in order to evaluate the similarity of anchor text's topic and the content of the page.To some extent,as the new algorithm takes semantic facts of the Web pages into consideration,it acquires higher accuracy than pure rule-based algorithm,while requires lower time complexity.The result of experiment indicates that this algorithm performs very effectively when purifying great mass of Web pages.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133