全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Detection and elimination of similar Web pages based on text structure and extraction of long sentences
基于正文结构和长句提取的网页去重算法*

Keywords: detection and elimination of similar Web pages,text structure tree,extraction of long sentences,layer fingerprint
网页去重
,正文结构树,长句提取,层次指纹

Full-Text   Cite this paper   Add to My Lib

Abstract:

As regard to the feature of the similarity and that of the text structure of Web pages,this paper proposed a dynamic,stratified and robust algorithm to detect and delete similar Web pages.By this method,expressed the texts of Web pages in the style of text structure trees.Then,thus implemented a dynamic algorithm to extract features of texts and a layer fingerprint algorithm to calculate similarity.That the extraction of the features made use of the algorithm of extraction of long sentences guarantees the robustness. The experimental results show that the method can carry out accurate detection concerning completely similar Web pages and partly similar ones.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133