To improve the power consumption of parallel applications at the runtime, modern processors provide frequency scaling and power limiting capabilities. In this work, a runtime strategy is proposed to distribute a given power allocation among the cluster nodes assigned to the application while balancing their performance change. The strategy operates in a timeslice-based manner to estimate the current application performance and power usage per node followed by power redistribution across the nodes. Experiments, performed on four nodes (112 cores) of a modern computing platform interconnected with Infiniband showed that even a significant power budget reduction of 20% may result in a performance degradation of as low as 1% under the proposed strategy compared with the execution in the unlimited power case.
References
[1]
Top 500 List. https://www.top500.org/lists/2019/06/
Chen, M., Wang, X. and Li, X. (2011) Coordinating Processor and Main Memory for Efficient Server Power Control. In: Proceedings of the International Conference on Supercomputing, ACM, New York, 130-140. https://doi.org/10.1145/1995896.1995917
[4]
Tiwari, A., Schulz, M. and Carrington, L. (2015) Predicting Optimal Power Allocation for CPU and Dram Domains. 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, Hyderabad, 25-29 May 2015, 951-959. https://doi.org/10.1109/IPDPSW.2015.146
[5]
Marathe, A., Bailey, P.E., Lowenthal, D.K., Rountree, B., Schulz, M. and de Supinski, B.R. (2015) A Run-Time System for Power-Constrained HPC Applications. In: Kunkel, J. and Ludwig, T., Eds., High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science, Vol. 9137, Springer, Cham, 394-408. https://doi.org/10.1007/978-3-319-20119-1_28
[6]
Zou, P., Allen, T., Davis, C.H., Feng, X. and Ge, R. (2017) CLIP: Cluster-Level Intelligent Power Coordination for Power-Bounded Systems. 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, 5-8 September 2017, 541-551. https://doi.org/10.1109/CLUSTER.2017.98
[7]
Sarood, O., Langer, A., Kalé, L., Rountree, B. and de Supinski, B. (2013) Optimizing Power Allocation to CPU and Memory Subsystems in Overprovisioned HPC Systems. 2013 IEEE International Conference on Cluster Computing (CLUSTER), Indianapolis, 23-27 September 2013, 1-8. https://doi.org/10.1109/CLUSTER.2013.6702684
[8]
David, H., Gorbatov, E., Hanebutte, U.R., Khannal, R. and Le, C. (2010) Rapl: Memory Power Estimation and Capping. In: Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design, ACM, New York, 189-194. https://doi.org/10.1145/1840845.1840883
[9]
Ellsworth, D.A., Malony, A.D., Rountree, B. and Schulz, M. (2015) POW: System-Wide Dynamic Reallocation of Limited Power in HPC. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, ACM, New York, 145-148. https://doi.org/10.1145/2749246.2749277
[10]
Gholkar, N., Mueller, F. and Rountree, B. (2019) Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery, New York, Article No. 27. https://doi.org/10.1145/3295500.3356150
[11]
André, E., Dulong, R., Guermouche, A. and Trahay, F. (2020) DUF: Dynamic Uncore Frequency Scaling to Reduce Power Consumption. Working Paper or Preprint. https://hal.archives-ouvertes.fr/hal-02401796v2
[12]
Haj-Yahya, J., Alser, M., Kim, J., Yaglikci, A.G., Vijaykumar, N., Rotem, E. and Mutlu, O. (2020) SysScale: Exploiting Multi-Domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile Processors. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, 30 May-3 June 2020, 227-240. https://doi.org/10.1109/ISCA45697.2020.00029
[13]
Sundriyal, V., Sosonkina, M., Westheimer, B. and Gordon, M. (2018) Comparisons of Core and Uncore Frequency Scaling Modes in Quantum Chemistry Application Games. In: Proceedings of the High Performance Computing Symposium, Society for Computer Simulation International, San Diego, 13:1-13:11.
[14]
Sundriyal, V., Sosonkina, M., Westheimer, B. and Gordon, M. (2018) Core and Uncore Joint Frequency Scaling Strategy. Journal of Computer and Communication, 6, 184-201. https://doi.org/10.4236/jcc.2018.612018
[15]
Shanley, T. and Winkles, J. (2002) InfiniBand Network Architecture. Addison-Wesley Professional, Boston.
[16]
Sundriyal, V. and Sosonkina, M. (2016) Joint Frequency Scaling of Processor and DRAM. The Journal of Supercomputing, 72, 1549-1569. https://doi.org/10.1007/s11227-016-1680-4
[17]
Ioannou, N., Kauschke, M., Gries, M. and Cintra, M. (2011) Phase-Based Application-Driven Hierarchical Power Management on the Single-Chip Cloud Computer. International Conference on Parallel Architectures and Compilation Techniques (PACT), Galveston, 10-14 October 2011, 131-142. https://doi.org/10.1109/PACT.2011.19
[18]
Sundriyal, V., Sosonkina, M. and Gordon, M.S. (2019) Maximizing Performance under a Power Constraint on Modern Multicore Systems. Journal of Computer and Communications, 7, 252-266. https://doi.org/10.4236/jcc.2019.77021
[19]
Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes 3A, 3B, and 3C: System Programming Guide. https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html#combined
[20]
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., et al. (1991) The NAS Parallel Benchmarks—Summary and Preliminary Results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, ACM, New York, 158-165. https://doi.org/10.1145/125826.125925
[21]
Schmidt, M.W., Baldridge, K.K., Boatz, J.A., Elbert, S.T., Gordon, M.S., Jensen, J.H., Koseki, S., Matsunaga, N., Nguyen, K.A., Su, S., Windus, T.L., Dupuis, M. and Montgomery Jr., J.A. (1993) General Atomic and Molecular Electronic Structure System. Journal of Computational Chemistry, 14, 1347-1363. https://doi.org/10.1002/jcc.540141112
[22]
Barca, G.M.J., Bertoni, C., Carrington, L., Datta, D., De Silva, N., Deustua, J.E., et al. (2020) Recent Developments in the General Atomic and Molecular Electronic Structure System. Journal of Chemical Physics, 152, 154102. https://doi.org/10.1063/5.0005188
[23]
Thermal Design Power (TDP) in Intel® Processors, 2019. https://www.intel.com/content/www/us/en/support/articles/000055611/processors.html