%0 Journal Article
%T Deduplication model based on file-similarity clustering<br>基于文件相似性分簇的重复数据消除模型
%A WANG Can
%A QIN Zhi-guang
%A WANG Juan
%A CAI Bo
%A <br>王灿
%A 秦志光
%A 王娟
%A 蔡博
%J 计算机应用研究
%D 2012
%I 
%X To resolve the locality dependence and multiple-nodes dependence problems of the current throughput improving methods for deduplication system, this paper proposed a deduplication model based on file-similarity clustering. This model expanded the traditional flat index structure into spatial structure. According to the Broder's theorem, it kept only a handful of the most representative indices in RAM. It partitioned the index horizontally and distributed on several totally autonomous storage nodes. The experimental results indicate that the model can effectively improve the deduplication performance and the throughput on average in the large scale cloud-storage environment, and the data loads are balanced. Therefore, the model can be extended smoothly.
%K cloud-storage
%K deduplication
%K throughput
%K file-similarity clustering
%K load balancing<br>云存储
%K 重复数据消除
%K 吞吐量
%K 文件相似性分簇
%K 负载均衡
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=F886A295C1E57CC5BB823A244B36A269&yid=99E9153A83D4CB11&vid=771469D9D58C34FF&iid=94C357A881DFC066&sid=FE397BCF5D340F6F&eid=4858DFA42406A0F9&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=0&reference_num=14