OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机应用 2009

New algorithm based on repeat sequence deletion
关于重复词提取的两种算法分析

JIANG Hua,YIN Bo,
蒋华,殷波

Keywords: repeated sequences,repeated segments,suffix tree
重复词句,重复序列,后缀树

Full-Text Cite this paper Add to My Lib

Abstract:

Aiming at the current de-duplication algorithms, two repeated sequences (RS)extracting algorithms were compared and analyzed. Since STC has favorable performance in considering time cost and the inverted index method is superior in terms of spatial complexity, STC was used to improve RS algorithm. Experiment results show that this method can find similar Web pages efficiently. This algorithm can reach a high precision in mono-language deletion of duplicated Web pages, and this algorithm can also reach a maximum precision when it is applied to deletion of duplicated web pages.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

New algorithm based on repeat sequence deletion关于重复词提取的两种算法分析

New algorithm based on repeat sequence deletion
关于重复词提取的两种算法分析