一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现  被引量:5

Quasi-diagonal Matrix Hybrid Compression Algorithm and Implementation for SpMV on GPU

在线阅读下载全文

作  者:阳王东[1,2] 李肯立 石林 

机构地区:[1]湖南城市学院信息科学与工程学院,益阳413082 [2]国家超级计算长沙中心,长沙410082

出  处:《计算机科学》2014年第7期290-296,共7页Computer Science

基  金:国家自然科学基金重点项目(61133005);国家自然基金项目(61070057);国家科技支撑计划项目(2012BAH09B02);教育部科技创新工程重大项目培育资金项目(708066);教育部博士点基金(20100161110019);教育部新世纪优秀人才支持计划(NCET-08-0177);湖南省教育厅重点科研项目(13A011)资助

摘  要:稀疏矩阵与向量乘(SpMV)属于科学计算和工程应用中的一种基本运算,其高性能实现与优化是计算科学的研究热点之一。在微分方程的求解过程中会产生大规模的稀疏矩阵,而且很大一部分是一种准对角矩阵。针对准对角矩阵存在的一些不规则性,提出一种混合对角存储(DIA)和行压缩存储(CSR)格式来进行SpMV计算,对于分割出来的对角线区域之外的离散非零元素采用CSR存储,这样能够克服DIA在不规则情况下存储矩阵的列迅速增加的缺陷,同时对角线采用DIA存储又能充分利用矩阵的对角特征,以减少CSR的行非零元素数目的不均衡现象,并可以通过调整存储对角线的带宽来适应准对角矩阵的不同的离散形式,以获得比DIA和CSR更高的压缩比,减小计算的数据规模。利用CUDA平台在GPU上进行了实验测试,结果表明该方法比DIA和CSR具有更高的加速比。Sparse matrix-vector multiplication(SpMV) is of singular importance in sparse linear algebra,which is an im- portant issue in scientific computing and engineering practice. Much effort has been put into accelerating the SpMV and a few parallel solutions have been proposed. In this paper we focused on a special SpMV, sparse quasi-diagonal matrix multiplication(SQDMV). The sparse quasi diagonal matrix is the key to solve many differential equation and very little research is done on this field. We discussed data structures and algorithms for SQDMV that were efficiently implemen- ted on the CUDA platform for the fine-grained parallel architecture of the GPU. We presented a new diagonal storage format HDC, which overcomes the inefficiency of DIA in storing irregular matrix and the imbalances of CSR in storing non-zero element. Further, HI)C can adjust the storage bandwidth of the diagonal to adapt to different discrete degree of sparse matrix, so as to get higher compression ratio than the DIA and CSR, reduce the computation complexity. Our im- plementation in GPU shows that the performance of HDC is better than other format especially for matrix with some discrete points outside the main diagonal. In addition, we combined the different parts of HDC to a unified kernel to get better compress ration and higher speedup ratio in GPU.

关 键 词:图形处理芯片 稀疏矩阵 稀疏矩阵与向量相乘 CUDA 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象