|
现代图书情报技术 2010
A Survey of Approximately Duplicate Data Cleaning Method
|
Abstract:
This paper introduces the steps, frameworks and metrics of approximately duplicate data cleaning. Then, the detect algorithms and the elimination algorithms are surveyed essentially,according to type and their improvement methods, and the algorithms usage scope and their advantages and disadvantages are given. Many data cleaning tools are presented, such as Merge/Purge. Finaly, it discusses the future research topics in data cleaning and points out that the concept of knowledge and semantic used in the framework of data cleaning will be an important trend.