supported by Beijing Natural Science Foundation (4174100);NSFC(61602054);the Fundamental Research Funds for the Central Universities
Cloud computing is becoming an important solution for providing scalable computing resources via Internet. Because there are tens of thousands of nodes in data center, the probability of server failures is nontrivial....
supported by National Basic Research Program of China(Grant No.2007CB310900);China National Natural Science Foundation(NSFC)(Grant Nos.60973133,61133006);MoE-Intel Information Technology Special Research Foundation(Grant No.MOE-INTEL-10-05);U.S.NSF(Grant No.CNS-0914330)
Recent advance of virtualization technology provides a new approach to check-point/restart at the virtual machine (VM) level. In contrast to traditional process-level checkpointing, checkpointing at the vir- tualiza...
supported by the National Natural Science Foundation of China under Grant Nos. 60921062, 61003087, 61120106005 and 61170049
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer...
supported by the National High Technology Research and Development 863 Program of China under Grant No. 2007AA01Z117;the National Basic Research 973 Program of China under Grant No.2007CB310900
High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and s...
the Postdoctoral Science Foundation (No. 20060390461);the Basic Research Foundation of Harbin Engineering University (Nos. HEUF040806,HEUFT05009, and HEUFP05020)
When applied to mobile computing systems,checkpoint protocols for distributed computing systems would face many new challenges, such as low wireless bandwidth, frequent disconnections, and lack of stable storage at mo...
To reduce the overhead of coordinated checkpoint algorithm executing, this paper introduces the concept of "computing checkpoint" to design an efficient coordinated checkpoint algorithm. Through piggybacking the inf...
A checkpointing scheme for relevant distributed real-time tasks which can be scheduled as a DAG is proposed. A typical algorithm, OSA, is selected for DAG scheduling. A new methods based a new structure, Scheduled Clu...
Adaptive checkpointing strategy is an efficient recovery scheme, which is suitable for mobile computing system. However, all existing adaptive checkpointing schemes are not correct to recover system when failure occu...