|
计算机应用研究 2012
Deduplication model based on file-similarity clustering
|
Abstract:
To resolve the locality dependence and multiple-nodes dependence problems of the current throughput improving methods for deduplication system, this paper proposed a deduplication model based on file-similarity clustering. This model expanded the traditional flat index structure into spatial structure. According to the Broder's theorem, it kept only a handful of the most representative indices in RAM. It partitioned the index horizontally and distributed on several totally autonomous storage nodes. The experimental results indicate that the model can effectively improve the deduplication performance and the throughput on average in the large scale cloud-storage environment, and the data loads are balanced. Therefore, the model can be extended smoothly.