%0 Journal Article
%T A Survey of Approximately Duplicate Data Cleaning Method
相似重复记录清理方法研究综述
%A Ye Huanzhuo
%A Wu Di
%A
叶焕倬
%A 吴迪
%J 现代图书情报技术
%D 2010
%I
%X This paper introduces the steps, frameworks and metrics of approximately duplicate data cleaning. Then, the detect algorithms and the elimination algorithms are surveyed essentially,according to type and their improvement methods, and the algorithms usage scope and their advantages and disadvantages are given. Many data cleaning tools are presented, such as Merge/Purge. Finaly, it discusses the future research topics in data cleaning and points out that the concept of knowledge and semantic used in the framework of data cleaning will be an important trend.
%K 相似重复记录
%K 数据清洗
%K 检测算法
%K 清除算法
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=B5EDD921F3D863E289B22F36E70174A7007B5F5E43D63598017D41BB67247657&cid=E46382710BF131B2&jid=24AADBCD0D5373C73F37F78D10E2F717&aid=43F8C8E87A65A4CF83ADFA19BD291B27&yid=140ECF96957D60B2&vid=96C778EE049EE47D&iid=9CF7A0430CBB2DFD&sid=014B591DF029732F&eid=5C3443B19473A746&journal_id=1003-3513&journal_name=现代图书情报技术&referenced_num=0&reference_num=0