Automatic tuning of sparse matrix-vector multiplication on multicore clusters 被引量：3

Automatic tuning of sparse matrix-vector multiplication on multicore clusters

作　　者：LI ShiGang HU ChangJun ZHANG JunChao ZHANG YunQuan

机构地区：[1]State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences [2]School of Computer and Communication Engineering, University of Science and Technology Beijing [3]Department of Computer Science, University of Illinois at Urbana-Champaign

出　　处：《Science China(Information Sciences)》2015年第9期13-26,共14页中国科学（信息科学）（英文版）

基　　金：supported by the State Key Program of National Natural Science of China(Grant Nos.61432018,61133005);National Natural Science Foundation of China(Grant No.61272136);Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.61221062);National Basic Research Program of China(Grant No.2013CB329606)

摘　　要：To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper in- vestigates the automatic tuning of the sparse matrix-vector （SpMV） multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communica- tion pattern for SpMV. As a result, our tuned SpMV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conduct- ed on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned SpMV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based inessage-passing implementation, respectively.To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper in- vestigates the automatic tuning of the sparse matrix-vector （SpMV） multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communica- tion pattern for SpMV. As a result, our tuned SpMV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conduct- ed on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned SpMV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based inessage-passing implementation, respectively.

关键词：SpMV PGAS HYBRIDIZATION model-driven multicore clusters

分类号：TP332[自动化与计算机技术—计算机系统结构] O183.1[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Automatic tuning of sparse matrix-vector multiplication on multicore clusters 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Automatic tuning of sparse matrix-vector multiplication on multicore clusters 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索