Memory bandwidth optimization of SpMV on GPGPUs

Memory bandwidth optimization of SpMV on GPGPUs

作　　者：Chenggang Clarence YAN Hui YU Weizhi XU Yingping ZHANG Bochuan CHEN Zhu TIAN Yuxuan WANG Jian YIN

机构地区：[1]Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China [2]Institute of Microelectronics, Tsinghua University, Beijing 100084, China [3]Automation Department, Tsinghua University, Beijing 100084, China [4]State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China [5]State Grid Information & Communication Company of Hunan EPC, Changsha 410007, China [6]Department of Computer, Shandong University, Weihai 250101, China

出　　处：《Frontiers of Computer Science》2015年第3期431-441,共11页中国计算机科学前沿（英文版）

摘　　要：It is an important task to improve performance for sparse matrix vector multiplication （SpMV）, and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU （GPGPU） provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.It is an important task to improve performance for sparse matrix vector multiplication （SpMV）, and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU （GPGPU） provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.

关键词：GPGPU performance tuning SpMV cacheblocking memory bandwidth

分类号：TP333[自动化与计算机技术—计算机系统结构] TN915.1[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Memory bandwidth optimization of SpMV on GPGPUs

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Memory bandwidth optimization of SpMV on GPGPUs

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索