检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《高性能计算技术》2012年第6期19-25,共7页
摘 要:本文针对“带状”稀疏矩阵,提出和实现了一种高效的矩阵向量乘存储格式和算法“bDIA”。bDIA基于开源CUDA线性代数库cUSP实现。bDIA与cUSP支持的5种常见稀疏矩阵存储格式和相应spMV算法进行了比较。测试基于nVidia的GTX280系列GPU完成。测试数据显示:所提出的bDIA格式以及相应的spMV算法可以达到单精度约60Gflops和双精度约30Gflops的高性能,突破了该系列GPU在spMV计算时,4%的单精度浮点效率上限和22.2%的双精度浮点效率上限,浮点效率相对于其它5种常见格式的最好性能有近1倍的提升。Numerical methods of PDEs are mostly "compactly supported", say finite element, finite difference methods etc. Due to the compact support, the global matrix associated with those numerical methods for scientific and engineering are sparse and very oftenalso band shaped. In this paper, we propose and develop a high performance spMV algorithm for this specific but widely used sparse matrix type. The new algorithm, termed "bDIA", is implemented on GTX280 series GPU and the open source cuda linear algebra library, CUSP. Detailed comparisons with the 5 other popular and de factosparse matrix formats/algorithms supported in CUSP showed that bDIA achieves 60 Gflops and 30 Gflops in single and double precisions respectively and remarkably doubles the best performance of the other 5 popularly used algorithms.
关 键 词:带状稀疏矩阵向量乘 bDIA 广义有限元 GPU
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145