|
- 2015
基于哈希表的MapReduce算法优化
|
Abstract:
摘要: 分布式并行计算是提高计算机性能常用的方法,但针对不同需求,并行程序的设计并没有统一的模型与方法,使得并行程序的编写完全依靠开发人员的经验。Google公司提出的分布式并行编程模型MapReduce能够完成特定类型的并行程序的开发与运行。使用哈希表对MapReduce分布式并行编程模型进行优化,减少中间结果中的碎片,并省略Combiner中间函数的调用,减少传输负载,提升运行效率,同时兼顾了Map函数与Reduce函数接口的属性,保持了MapReduce模型的并行性特点。
Abstract: Distributed parallel computing is commonly used to improve computer performance. But according to different demands, there is not a uniform way to design and implement parallel program. Parallel programming depends on the experience of developer. MapReduce, a distributed parallel programming model, put forward by Google, can perform special parallel program development and operation. MapReduce was optimized by using Hash table, which would decrease fragment of Map function, skip other redundancy function such as Combiner function, reduce transmission load and improve computing efficiency. Meanwhile, the attributes of Map function and Reduce function were kept to make MapReduce maintaining parallel
[1] | 谢桂兰, 罗省贤. 基于 Hadoop MapReduce 模型的应用研究[J]. 微型机与应用, 2010, 25(3):4-7. XIE Guilan, LUO Shengxian. Based on Hadoop MapReduce model apply and research[J]. Journal of Microcomputer and Applications, 2010, 25(3):4-7. |
[2] | 张雪萍,龚康莉,赵广才. 基于MapReduce的K-Medoids 并行算法[J]. 计算机应用, 2013, 33(4):1023-1025. ZHANG Xueping, GONG Kangli, ZHAO Guangcai. Parallel K-Medoids algorithm based on MapReduce[J].Journal of Computer Applications, 2013, 33(4):1023-1025. |
[3] | APACHE. Welcome to Apache Hadoop[EB/OL].[2014-09-09]. http://hadoop.apache.org/. 2014 |
[4] | 李建江, 崔健, 王聃, 等. MapReduce并行编程模型研究综述[J]. 电子学报, 2011, 39(11):2635-2642. LI Jianjiang, CUI Jian, WANG Dan, et al. Review of MapReduce parallel computing programming model[J]. Chinese Journal of Electronics, 2011, 39(11):2635-2642. |
[5] | LEISERSON C E, RIVEST R L, STEIN C. Introduction to algorithms[M]. The MIT Press, 2001. |
[6] | 陈全,邓倩妮.云计算及其关键技术[J].计算机应用,2009,29(9):2562-2567. CHEN Quan, DENG Qianni.Cloud computing and its key techniques[J]. Journal of Computer Applications, 2009, 29(9):2562-2567. |
[7] | 郭进伟,皮建勇. 基于MapReduce的SON算法实现[J]. 计算机应用,2014, 34(S1):100-102. GUO Jinwei, PI Jianyong. Realization SON algorithm based on MapReduce[J]. Journal of Computer Applications, 2014, 34(S1):100-102. |
[8] | BONDHUGULA U.Automatic distributed-memory parallelization and code generation using the polyhedral framework, IISc-CSA-TR-2011-3[R].Bangalore: Indian Institute of Science, 2011. |
[9] | DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1):107-113. |
[10] | 陆秋,程小辉. 基于MapReduce 的决策树算法并行化[J].计算机应用, 2012, 32(9):2463-2465. LU Qiu, CHENG Xiaohui. Parallelization of decision tree algorithm based on MapReduce[J].Journal of Computer Applications, 2012, 32(9):2463-2465. |
[11] | YANG H, DASDAN A, HSIAO R L, et al. Map-reduce-merge: simplified relational data processing on large clusters[C]//Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2007:1029-1040. |
[12] | 孙牧. 云端的小飞象——Hadoop[J]. 程序员, 2008(10):100-102. SUN Mu. The flying elephant on cloud—Haoop[J]. Journal of Programmer, 2008(10):100-102. |
[13] | 李成华, 张新访, 金海, 等. MapReduce: 新型的分布式并行计算编程模型[J]. 计算机工程与科学, 2011, 33(3):129-135. LI Chenghua, ZHANG Xinfang, JIN Hai, et al. MapReduce: a new distributed parallel computing programming model[J]. Journal of Computer Engineering and Science, 2011, 33(3):129-135. |