分层检查点的近似最优周期计算模型  被引量:1

Quasi-optimal period computation model for hierarchical checkpoint protocol

在线阅读下载全文

作  者:吕宏武[1] 谷雷 王慧强[1] 邹世辰[1] 冯光升[1] 

机构地区:[1]哈尔滨工程大学计算机科学与技术学院,哈尔滨150001

出  处:《计算机应用》2017年第1期103-107,共5页journal of Computer Applications

基  金:国家自然科学基金资助项目(61370212;61402127;61502118);黑龙江省自然科学基金资助项目(F2015029)~~

摘  要:针对大规模高性能计算(HPC)系统中检查点效率提升问题,提出一种面向分层检查点近似最优周期计算模型。首先,通过分析一个HPC系统中应用程序的执行过程,将检查点周期优化抽象为一个非线性的检查点成本模型;其次,通过分析可能故障位置推导出分层检查点成本公式,并引入两个减速因子和一个加速因子来模拟消息日志对分层检查点造成的影响。仿真实验结果表明,所提模型与理论近似最优周期检查点成本平均误差在5%以下,相对传统检查点周期优化模型的平均误差降低了20%,能够有效提高检查点的效率,提升HPC系统可用性。With the increase of High Performance Computation (HPC) system scale, it's very important to increase the efficiency of the checkpoint. A model to compute the quasi-optimal period for hierarchical checkpoint protocol was proposed. First, the execution of an application in HPC system was assessed, and checkpoint period optimization problem was abstracted as the nonlinear checkpoint cost model. Second, the hierarchical checkpoint cost formula was derived by simulating the possible fault location; two deceleration parameters and an acceleration parameter were introduced to reflect the impact of message logging on the hierarchical checkpoint. The simulation results show that, compared with the quasi-optimal period checkpoint cost, the average error value of the proposed model is below 5%, which is 20% less than that of the traditional model based on Markov chain. The proposed model can signally increase the efficiency of the hierarchical checkpoint protocol; meanwhile enhance the availability of the HPC system.

关 键 词:高性能计算 容错 分层检查点 检查点周期 近似最优解 

分 类 号:TP399[自动化与计算机技术—计算机应用技术] TP302[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象