|
- 2015
一种面向大规模计算机的监控管理系统Abstract: 随着超级计算机系统性能的提升,系统规模越来越大,如何高效管理这些系统成为高性能计算机亟待解决的关键问题之一.本文提出了一种针对大规模计算机的监控管理系统——MMS(Monitoring and Management System).MMS采用分布式系统结构来提高监控管理系统的效率;监控信息的精细化处理降低了监控系统对计算网络的影响同时提高了基于web的客户端的反应速度;两级异步通信机制提高了MMS系统数据采集效率.理论分析与实验结果表明MMS运行效率高、可靠性好.With the improvement of the performance of the supercomputer, the scale of the system becomes larger and larger, and how to monitor and manage the system efficiently becomes one of the key issues to be urgently solved for the high-performance computer. This paper proposed a monitoring and management system for the large-scale computer system called MMS (Monitoring and Management System). In MMS, the efficiency was improved by using the distributed system architecture, and the data processing was designed subtly to decrease the effect of the MMS on the computing network, which made the response time of the web client fast at the same time. The theoretical analyses and experiment results show that the MMS is efficient and reliable.
|