全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Detection and elimination of similar Web pages based on text structure
基于网页文本结构的网页去重

Keywords: layer fingerprint,text structure,detection and elimination of similar Web pages
层次指纹
,文本结构,网页去重

Full-Text   Cite this paper   Add to My Lib

Abstract:

Similar Web pages that search engine returns not only waste storage resources but also increase the burden on Web users. A dynamic method to detect similar Web pages was proposed. By this method, Texts of Web pages were expressed in the style of catalogue structure trees according to the features of similar Web pages and the features of Web pages themselves. A dynamic algorithm to extract features of texts and a layer fingerprint algorithm to calculate similar degree were implemented. The experimental results show that completely similar Web pages are detected accurately, and partly similar Web pages are detected exactly.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133