全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

New algorithm based on repeat sequence deletion
关于重复词提取的两种算法分析

Keywords: repeated sequences,repeated segments,suffix tree
重复词句
,重复序列,后缀树

Full-Text   Cite this paper   Add to My Lib

Abstract:

Aiming at the current de-duplication algorithms, two repeated sequences (RS)extracting algorithms were compared and analyzed. Since STC has favorable performance in considering time cost and the inverted index method is superior in terms of spatial complexity, STC was used to improve RS algorithm. Experiment results show that this method can find similar Web pages efficiently. This algorithm can reach a high precision in mono-language deletion of duplicated Web pages, and this algorithm can also reach a maximum precision when it is applied to deletion of duplicated web pages.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133