|
- 2016
一种Mapreduce作业内存精确预测方法
|
Abstract:
针对准确预测mapreduce作业内存资源需求困难的问题,根据Java虚拟机(JVM)的分代(JVM将堆内存划分为年轻代和年长代)内存管理特点,该文提出一种分代内存预测方法。建立年轻代大小与垃圾回收时间的模型,将寻找合理年轻代大小的问题转换为一个受约束的非线性优化问题,并设计搜索算法求解该优化问题。建立mapreduce作业的map任务和reduce任务性能与内存的关系模型,求解最佳性能的内存需求,从而获得map任务和reduce任务的年长代内存大小。实验结果表明,本文提出的方法能准确预测作业的内存需求;与默认配置相比,能提供平均6倍的性能提升。
[1] | LIU C, ZENG D, YAO H, et al. MR-COF:a genetic mapreduce configuration optimization framework[M].[S.l.]:Springer International Publishing, 2015:344-357. |
[2] | XU L, LIU J, WEI J. FMEM:a fine-grained memory estimator for mapreduce jobs[C]//Proceedings of the 10th International Conference on Autonomic Computing. California, USA:USENIX in Cooperation with ACM SIGARCH, 2013:65-68. |
[3] | KEJARIWAL A. A tool for practical garbage collection analysis in the cloud[C]//2013 IEEE International Conference on Cloud Engineering (IC2E). Boston, USA:IEEE, 2013:46-53. |
[4] | REN Z, XU X, WAN J, et al. Workload characterization on a production hadoop cluster:A case study on taobao[C]//2012 IEEE International Symposium on Workload Characterization (ⅡSWC). California, USA:IEEE Computer Society, 2012:3-13. |
[5] | CHEN Y, GANAPATHI A, GRIFFITH R, et al. The case for evaluating mapreduce performance using workload suites[C]//201119th IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS). Singapore:IEEE, 2011:390-399 |
[6] | DEAN J, GHEMAWAT S. Mapreduce:Simplified data processing on large clusters[C]//Proceedings of the 6th Conference on Eperating Systems Design and Implementation. Berkeley, CA, USA:USENIX Association, 2004, 6:137-150. |
[7] | POLATO I, Ré R, GOLDMAN A, et al. A comprehensive view of Hadoop research-a systematic literature review[J]. Journal of Network and Computer Applications, 2014, 46:1-25. |
[8] | GERA S. Derive heap size or mapreduce.*.memory.mb. automatically[EB/OL].[2014-03-08]. https://issues.apache.org/jira/browse/MAPREDUCE-5785. |
[9] | LI M, ZENG L, MENG S, et al. MRONLINE:Mapreduce online performance tuning[C]//Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Vancouver, Canada:ACM, 2014:165-176. |
[10] | SINGER J, KOVOOR G, BROWN G, et al. Garbage collection auto-tuning for java mapreduce on multi-cores[J]. ACM SIGPLAN Notices, 2011, 46(11):109-118. |
[11] | ANGELOPOULOS V, PARSONS T, MURPHY J, et al. GcLite:an expert tool for analyzing garbage collection behavior[C]//201236th IEEE Annual Computer Software and Applications Conference Workshops (COMPSACW). Lzmir, Turkey:IEEE, 2012:493-502. |
[12] | SUN M. Memory management in the Java hotspot virtual machine[EB/OL].[2014-08-28]. http://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf. |
[13] | RAO S S. Engineering optimization:Theory and practice[M]. New Jersey, USA:John Wiley & Sons, 2009. |
[14] | FARAZ A, SEYONG L, MITHUNA T,et al. PUMA:Purdue mapreduce benchmarks suite[EB/OL].[2013-09-26]. http://web.ics.purdue.edu/~fahmad/benchmarks.htm. |
[15] | HERODOTOU H, BABU S. Profiling, what-if analysis, and cost-based optimization of mapreduce programs[J]. Proceedings of the VLDB Endowment, 2011, 4(11):1111-1122. |
[16] | HERODOTOU H. Hadoop performance models[EB/OL].[2014-12-04]. http://arxiv.org/pdf/1106.0940v1.pdf. |
[17] | CLEVELAND W S, DEVLIN S J. Locally weighted regression:an approach to regression analysis by local fitting[J]. Journal of the American Statistical Association, 1988, 83(403):596-610. |