%0 Journal Article
%T String Distances for Near-duplicate Detection
%A D？n？il？
%A Iulia
%A Dinu
%A Liviu P.
%A Niculae
%A Vlad
%A Sulea
%A Octavia-Maria
%J Polibits
%D 2012
%I Scientific Electronic Library Online
%X near-duplicate detection is important when dealing with large, noisy databases in data mining tasks. in this paper, we present the results of applying the rank distance and the smith-waterman distance, along with more popular string similarity measures such as the levenshtein distance, together with a disjoint set data structure, for the problem of near-duplicate detection.
%K near-duplicate detection
%K string similarity measures
%K database
%K data mining.
%U http://www.scielo.org.mx/scielo.php?script=sci_abstract&pid=S1870-90442012000100004&lng=en&nrm=iso&tlng=en