全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2012 

Removing Fully and Partially Duplicated Records through K-Means Clustering

DOI: 10.7763/IJET.2012.V4.477

Full-Text   Cite this paper   Add to My Lib

Abstract:

Abstract—Records duplication is one of the prominent problems in data warehouse. This problem arises when various databases are integrated. This research focuses on the identification of fully as well as partially duplicated records. In this paper we propose a de-duplicator algorithm which is based on numeric conversion of entire data. For efficiency, data mining technique k-mean clustering is applied on the numeric value that reduces the number of comparisons among records. To identify and remove the duplicated records, divide and conquer technique is used to match records within a cluster which further improves the efficiency of the algorithm.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133