%0 Journal Article %T E级超级计算机故障预测的数据采集方法 %A 胡维 %A 蒋艳凰 %A 刘光明 %A 董文睿 %A 崔新武< %A /br> %A HU Wei %A JIANG Yanhuang %A LIU Guangming %A DONG Wenrui %A CUI Xinwu %J 国防科技大学学报 %D 2016 %R 10.11887/j.cn.201601016 %X 面向未来E级超级计算机,提出用于故障预测的数据采集框架,能够全面采集与计算结点故障相关的状态数据。采用自适应多层分组数据汇集方法,有效解决随着系统规模增长数据汇集过程开销过大的问题。在TH-1A超级计算机上的实现和测试表明,该数据采集框架具有开销小、扩展性好的优点,能够满足未来大规模系统故障预测数据采集的需求。</br>Aimed at an exascale supercomputer, an FPDC (failure prediction data collection framework) was introduced to fully collect the data related to the state of compute nodes’ health. An adaptive multi-layer data aggregation method was presented for data aggregation with less overhead. Extensive experiments, by implementing FPDC on TH-1A,indicate that the FPDC has the advantage of high efficiency and good scalability. %K 超级计算机 故障预测 数据采集方法 数据汇集< %K /br> %K supercomputer failure prediction data collection method data aggregation %U http://journal.nudt.edu.cn/gfkjdxxb/ch/reader/view_abstract.aspx?file_no=201601016&flag=1