|
- 2018
采用弧映射的双层对象分布算法
|
Abstract:
针对由混合存储设备组成的对象存储系统中数据对象分布时间开销过大的问题,提出了支持权值和副本机制的弧映射双层对象分布(TMHR)算法。该算法利用存储系统批量扩容、删除的特点,将存储系统内存储节点划分为多个子集群,采用可扩展的子集群哈希(SHFC)算法将数据对象按概率分布到子集群上以保证分布的公平性;在子集群内采用随机置换算法将数据对象等概率分布到节点上以降低数据对象分布时间开销。实验结果表明:与一致性哈希算法和随机切片算法相比,TMHR算法的数据对象分布时间分别缩短了20%和28%;数据对象的分布也更加接近理论情况;在存储节点数变化后,存储系统能够迁移较少的数据对象以进行重新均衡。该算法满足公平性、高效性、简洁性和自适应性,可以降低存储系统I/O路径的时延,提高存储系统性能,较适用于异构混合对象存储系统。
An object distribution algorithm based on arc and two??layer mapping is proposed to solve the problem of too much cost time of object distribution in object storage systems consisting of hybrid storage nodes. The algorithm supports weighted allocation and variable levels of object replication mechanism and uses the characteristics of storage system expansion and deletion in batch to divide the storage nodes in the storage system into multiple sub??clusters. The extensible subject group hashing (SHFC) algorithm and a random replacement algorithm are applied to distribute data objects to node to ensure the uniformity of the distribution and to reduce the time overhead. Theoretical analysis shows that the proposed algorithm is fair, simple, effective and adaptive. Experimental results and comparisons with the consistent hashing algorithm and the random slicing algorithm show that the running time of the proposed algorithm reduces 20% and 28%, respectively, and the object distribution of the algorithm is also closer to the theoretical situation. The storage system migrates a minimized number of objects to rebalance the distribution when the number of storage nodes changes. It is concluded that the algorithm could reduce storage system I/O path latency and improve storage system I/O performance, so it is relatively suitable for heterogeneous hybrid object storage system
[1] | [5]ZHOU J, XIE W, GU Q, et al. Hierarchical consistent hashing for heterogeneous object??based storage [C]∥Proceedings of the 14th IEEE International Symposium on Parallel and Distributed Processing with Applications. Piscataway, NJ, USA: IEEE, 2016: 1597??1604. |
[2] | [6]XIE W, ZHOU J, NOBLE J, et al. Two??mode data distribution scheme for heterogeneous storage in data centers [C]∥Proceedings of the 2015 IEEE International Conference on Big Data. Piscataway, NJ, USA: IEEE, 2015: 327??332. |
[3] | [7]WEIL S A, BRANDT S A, MILLER E L, et al. CRUSH: controlled, scalable, decentralized placement of replicated data [C]∥Proceedings of the 2006 ACM/IEEE Conference on Super??Computing. Piscataway, NJ, USA: IEEE, 2006: 31. |
[4] | [8]陈涛, 肖侬, 刘芳. 对象存储系统中一种高效的分层对象布局算法 [J]. 计算机研究与发展, 2012, 49(4): 887??899. |
[5] | CHEN Tao, XIAO Nong, LIU Fang. An efficient hierarchical object placement algorithm for object storage systems [J]. Journal of Computer Research and Development, 2012, 49(4): 887??899. |
[6] | [9]MIRANDA A, EFFERT S, KANG Y, et al. Random slicing: efficient and scalable data placement for large??scale storage systems [J]. ACM Transactions on Storage, 2014, 10(3): 1??35. |
[7] | [15]STINSON D R. Combinatorial designs: constructions and analysis [M]. Berlin, Germany: Springer??Verlag, 2003: 56??57. |
[8] | [1]WEIL S A, BRANDT S A, MILLER E L, et al. Ceph: a scalable, high??performance distributed file system [C]∥Proceedings of the 7th Symposium on Operating Systems Design and Implementation. New York, USA: ACM, 2006: 307??320. |
[9] | [3]FACTOR M, METH K, NAOR D, et al. Object storage: the future building block for storage systems [C]∥Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology. Piscataway, NJ, USA: IEEE, 2005: 119??123. |
[10] | [4]KARGER D, LEHMAN E, LEIGHTON T, et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web [C]∥Proceedings of the 29th ACM Symposium on Theory of Computing. New York, USA: ACM, 1997: 654??663. |
[11] | [13]SHEN Z, LEE P P C, SHU J, et al. Encoding??aware data placement for efficient degraded reads in XOR??coded storage systems [C]∥ Proceedings of the 35th IEEE Symposium on Reliable Distributed Systems. Piscataway, NJ, USA: IEEE, 2016: 239??248. |
[12] | [14]QIN Y, AI X, CHEN L, et al. Data placement strategy in data center distributed storage systems [C]∥ Proceedings of the 2016 IEEE International Conference on Communication Systems. Piscataway, NJ, USA: IEEE, 2017: 1??6. |
[13] | [2]LAKSHMAN A, MALIK P. Cassandra: a decentralized structured storage system [J]. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35??40. |
[14] | [10]XU M, ZHU Y, LEE P P C, et al. Even data placement for load balance in reliable distributed deduplication storage systems [C]∥Proceedings of the 23th IEEE International Symposium on Quality of Service. Piscataway, NJ, USA: IEEE, 2016: 349??358. |
[15] | [11]JOUINI K. Distorted replicas: intelligent replication schemes to boost I/O throughput in document??stores [C]∥ Proceedings of the 2017 IEEE/ACS International Conference on Computer Systems and Applications. Piscataway, NJ, USA: IEEE, 2017: 25??32. |
[16] | [12]GAO Y, LI K, JIN Y. Compact, popularity??aware and adaptive hybrid data placement schemes for heterogeneous cloud storage [J]. IEEE Access, 2017, 5(99): 1306??1318. |