%0 Journal Article %T String Distances for Near-duplicate Detection %A D£¿n£¿il£¿ %A Iulia %A Dinu %A Liviu P. %A Niculae %A Vlad %A Sulea %A Octavia-Maria %J Polibits %D 2012 %I Scientific Electronic Library Online %X near-duplicate detection is important when dealing with large, noisy databases in data mining tasks. in this paper, we present the results of applying the rank distance and the smith-waterman distance, along with more popular string similarity measures such as the levenshtein distance, together with a disjoint set data structure, for the problem of near-duplicate detection. %K near-duplicate detection %K string similarity measures %K database %K data mining. %U http://www.scielo.org.mx/scielo.php?script=sci_abstract&pid=S1870-90442012000100004&lng=en&nrm=iso&tlng=en