supported by the State Key Program of National Natural Science of China(Grant Nos.61432018,61133005);National Natural Science Foundation of China(Grant No.61272136);Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.61221062);National Basic Research Program of China(Grant No.2013CB329606)
To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper in- vestigate...