%0 Journal Article
%T Approximately duplicated records examining method and its application in ETL of data warehouse
数据仓库ETL中相似重复记录的检测方法及应用
%A ZHANG Yong
%A CHI Zhong-xian
%A YAN De-qin
%A
张永
%A 迟忠先
%A 闫德勤
%J 计算机应用
%D 2006
%I
%X Examining and eliminating approximately duplicated records is one of main problems needed to solve for data cleaning and improving data quality. The position-coding technology to ETL of data warehouse was introduced,a novel examining algorithm named Position-Coding Method(PCM) of approximately duplicated records was presented.The algorithm was applied to Chinese character set, as well as Western character set. Experiment comparison with the previous work indicates that the method is effective.
%K ETL
位置编码
%K 数据仓库
%K 相似重复记录
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=831E194C147C78FAAFCC50BC7ADD1732&aid=A9468141F5F384F0&yid=37904DC365DD7266&vid=96C778EE049EE47D&iid=E158A972A605785F&sid=750AE535ABE3D62A&eid=8047434EAE0B2346&journal_id=1001-9081&journal_name=计算机应用&referenced_num=4&reference_num=8