机构地区:[1]State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences [2]School of Computer and Communication Engineering, University of Science and Technology Beijing [3]Department of Computer Science, University of Illinois at Urbana-Champaign
出 处:《Science China(Information Sciences)》2015年第9期13-26,共14页中国科学(信息科学)(英文版)
基 金:supported by the State Key Program of National Natural Science of China(Grant Nos.61432018,61133005);National Natural Science Foundation of China(Grant No.61272136);Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.61221062);National Basic Research Program of China(Grant No.2013CB329606)
摘 要:To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper in- vestigates the automatic tuning of the sparse matrix-vector (SpMV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communica- tion pattern for SpMV. As a result, our tuned SpMV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conduct- ed on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned SpMV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based inessage-passing implementation, respectively.To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper in- vestigates the automatic tuning of the sparse matrix-vector (SpMV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communica- tion pattern for SpMV. As a result, our tuned SpMV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conduct- ed on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned SpMV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based inessage-passing implementation, respectively.
关 键 词:SpMV PGAS HYBRIDIZATION model-driven multicore clusters
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...