基于GPU的高性能稀疏矩阵向量乘

High Performance spMV for Banded Sparse Matrix on GPU

作　　者：王迎瑞[1] 田荣[1]

机构地区：[1]中国科学院计算技术研究所,北京100190

出　　处：《高性能计算技术》2012年第6期19-25,共7页

摘　　要：本文针对“带状”稀疏矩阵，提出和实现了一种高效的矩阵向量乘存储格式和算法“bDIA”。bDIA基于开源CUDA线性代数库cUSP实现。bDIA与cUSP支持的5种常见稀疏矩阵存储格式和相应spMV算法进行了比较。测试基于nVidia的GTX280系列GPU完成。测试数据显示：所提出的bDIA格式以及相应的spMV算法可以达到单精度约60Gflops和双精度约30Gflops的高性能，突破了该系列GPU在spMV计算时，4％的单精度浮点效率上限和22．2％的双精度浮点效率上限，浮点效率相对于其它5种常见格式的最好性能有近1倍的提升。Numerical methods of PDEs are mostly ＂compactly supported＂, say finite element, finite difference methods etc. Due to the compact support, the global matrix associated with those numerical methods for scientific and engineering are sparse and very oftenalso band shaped. In this paper, we propose and develop a high performance spMV algorithm for this specific but widely used sparse matrix type. The new algorithm, termed ＂bDIA＂, is implemented on GTX280 series GPU and the open source cuda linear algebra library, CUSP. Detailed comparisons with the 5 other popular and de factosparse matrix formats/algorithms supported in CUSP showed that bDIA achieves 60 Gflops and 30 Gflops in single and double precisions respectively and remarkably doubles the best performance of the other 5 popularly used algorithms.

关键词：带状稀疏矩阵向量乘 bDIA 广义有限元 GPU

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

相关期刊文献：

正在载入数据...

相关的主题

相关的作者对象

相关的机构对象