OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Polibits 2012

String Distances for Near-duplicate Detection

D？n？il？, Iulia;Dinu, Liviu P.;Niculae, Vlad;Sulea, Octavia-Maria;

Keywords: near-duplicate detection, string similarity measures, database, data mining.

Full-Text Cite this paper Add to My Lib

Abstract:

near-duplicate detection is important when dealing with large, noisy databases in data mining tasks. in this paper, we present the results of applying the rank distance and the smith-waterman distance, along with more popular string similarity measures such as the levenshtein distance, together with a disjoint set data structure, for the problem of near-duplicate detection.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133