%0 Journal Article
%T Optimization for Data De-duplication Algorithm Based on Storage Environment Aware
基于存储环境感知的重复数据删除算法优化
%A ZHOU Jing-li
%A NIE Xue-jun
%A QIN Lei-hua
%A LIU Ke
%A ZHU Jian-feng
%A WANG Yu
%A
周敬利
%A 聂雪军
%A 秦磊华
%A 刘科
%A 朱建峰
%A 王宇
%J 计算机科学
%D 2011
%I
%X Storage applications such as backup and archive are creating more and more duplication data, which has caused increasing waste in storage space and energy consumption, and how to delete duplication data has become a hot subject in research. CI}C(Content-Defined Chunking) is a prevail algorithm for data dcduplication and can be applicable in various environment,however it does not take into account some characteristics which are specific to individual environment and can influence its result. We studied the CDC' s application in storage system and put up with two constraints for determining parameters for CDC; (1) Determining parameters such as average block size based on the block organization of storage devices; (2)Determining block boundary based on candidate boundary distribution. The result indicates that, comparing with the separate CI}C without restraint conditions, these two constraints can achieve 16. 3% higher compression ratio with 4 data sets.
%K Data de-duplication
%K Storage environment aware
%K Cdc
%K File system
%K Block boundary
重复数据删除,存储环境感知,CDC,文件系统,分块边界
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=D879386B786F8AD04B346E4155AB3419&yid=9377ED8094509821&vid=16D8618C6164A3ED&iid=0B39A22176CE99FB&sid=E84BBBDDD74F497C&eid=5D71B28100102720&journal_id=1002-137X&journal_name=计算机科学&referenced_num=0&reference_num=15