基于FPGA和行折叠的稀疏矩阵向量乘优化  

Optimization of sparse matrix-vector multiplication based on FPGA and row folding

在线阅读下载全文

作  者:周智 高建花 计卫星[1] ZHOU Zhi;GAO Jian-hua;JI Wei-xing(School of Computer Science&Technology,Beijing Institute of Technology,Beijing 100081,China)

机构地区:[1]北京理工大学计算机学院,北京100081

出  处:《计算机工程与科学》2024年第8期1340-1348,共9页Computer Engineering & Science

摘  要:稀疏矩阵向量乘(SpMV)是科学与工程计算中的一个关键内核。由于稀疏矩阵中不规则的数据分布和SpMV计算中不规则的访存操作,SpMV在多核CPU和GPU等设备上的性能与这些设备的理论峰值还具有较大差距。现有的CPU和GPU由于在架构上受到限制,导致它们无法很好地利用稀疏矩阵的特殊结构来加速SpMV计算,而现场可编程门阵列(FPGA)可以通过自定义电路实现高效的并行运算,能够更好地处理稀疏矩阵的计算和存储问题。基于FPGA提出了一种SpMV优化方法,该优化方法基于高级综合的流式处理引擎,采用了一种自适应多行折叠的SpMV优化策略。该方法通过行折叠减少了处理引擎中零元的无效存储和计算,从而提升了基于FPGA的SpMV计算性能。实验结果表明,相比于现有的FPGA实现方案,设计的基于行折叠优化的数据流引擎实现了最高1.78倍和平均1.15倍的加速。Sparse matrix-vector multiplication(SpMV)is a key kernel in scientific and engineering computing.Due to the irregular data distribution in sparse matrices and the irregular memory access operations in SpMV calculations,the performance of SpMV on multicore CPUs and GPUs still lags significantly behind the theoretical peak performance of these devices.Existing CPUs and GPUs are limited in their architectures,making them unable to effectively utilize the special structure of sparse matrices to accelerate SpMV calculations.However,Field-Programmable gate arrays(FPGA)can achieve efficient parallel computing through customized circuits,which better handle the computation and storage issues of sparse matrices.An SpMV optimization method based on FPGA is proposed,which utilizes a high-level synthesis streaming processing engine and employs an adaptive multi-row folding SpMV optimization strategy.This method reduces the ineffective storage and computation of zero elements in the processing engine through row folding,thereby enhancing the performance of FPGA-based SpMV calculations.Experimental results show that compared to existing FPGA implementations,the proposed row folding-based dataflow engine achieves a maximum speedup of 1.78 times and an average speedup of 1.15 times.

关 键 词:稀疏矩阵向量乘 现场可编程门阵列 高级综合 行折叠 

分 类 号:TP492[自动化与计算机技术] O246[理学—计算数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象