Project supported by the National Natural Science Foundation of China(Nos.61272141,61120106005,and 61303068);the National High-Tech R&D Program of China(No.2012AA01A301)
As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively f...
the National Natural Science Foundation of China(Nos.61272141 and 61120106005);the National High-Tech R&D Program(863)of China(No.2012AA01A301)
The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive,and their overa...
supported by the National Natural Science Foundation of China(No.61303071,No.61120106005,No.60921062)
Current parallel programs are often composed of lots of processes that communicate through message passing. The mapping of the program processes onto the underlying processor. By analyzing the MPI library,we find that...