Automatic tuning of sparse matrix-vector multiplication on multicore clusters  被引量:3

Automatic tuning of sparse matrix-vector multiplication on multicore clusters

在线阅读下载全文

作  者:LI ShiGang HU ChangJun ZHANG JunChao ZHANG YunQuan 

机构地区:[1]State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences [2]School of Computer and Communication Engineering, University of Science and Technology Beijing [3]Department of Computer Science, University of Illinois at Urbana-Champaign

出  处:《Science China(Information Sciences)》2015年第9期13-26,共14页中国科学(信息科学)(英文版)

基  金:supported by the State Key Program of National Natural Science of China(Grant Nos.61432018,61133005);National Natural Science Foundation of China(Grant No.61272136);Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.61221062);National Basic Research Program of China(Grant No.2013CB329606)

摘  要:To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper in- vestigates the automatic tuning of the sparse matrix-vector (SpMV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communica- tion pattern for SpMV. As a result, our tuned SpMV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conduct- ed on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned SpMV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based inessage-passing implementation, respectively.To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper in- vestigates the automatic tuning of the sparse matrix-vector (SpMV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communica- tion pattern for SpMV. As a result, our tuned SpMV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conduct- ed on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned SpMV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based inessage-passing implementation, respectively.

关 键 词:SpMV PGAS HYBRIDIZATION model-driven multicore clusters 

分 类 号:TP332[自动化与计算机技术—计算机系统结构] O183.1[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象