并行规约与扫描原语在ReRAM架构上的性能优化  

Accelerating parallel reduction and scan primitives on ReRAM-based architectures

在线阅读下载全文

作  者:金洲 段懿洳 伊恩鑫 戢昊男 刘伟峰[1] JIN Zhou;DUAN Yiru;YI Enxin;JI Haonan;LIU Weifeng(College of Information Science and Engineering,China University of Petroleum,Beijing 102249,China)

机构地区:[1]中国石油大学(北京)信息科学与工程学院,北京102249

出  处:《国防科技大学学报》2022年第5期80-91,共12页Journal of National University of Defense Technology

基  金:国家自然科学基金资助项目(61972415);计算机体系结构国家重点实验室开放课题资助项目(CARCHA202115)。

摘  要:规约与扫描是并行计算中的核心原语,其并行加速至关重要。然而,冯·诺依曼体系结构下无法避免的数据移动使其面临“存储墙”等性能与功耗瓶颈。近来,基于ReRAM等非易失存储器的存算一体架构支持的原位计算可一步实现矩阵-向量乘,已在机器学习与图计算等应用中展现了巨大的潜力。提出面向忆阻器存算一体架构的规约与扫描的并行加速方法,重点阐述基于矩阵-向量乘运算的计算流程和在忆阻器架构上的映射方法,实现软硬件协同设计,降低功耗并提高性能。相比于GPU,所提规约与扫描原语可实现高达两个数量级的加速,平均加速比也可达到两个数量级。分段规约与扫描最大可达到五个(平均四个)数量级的加速,并将功耗降低79%。Reduction and scan are two critical primitives in parallel computing.Thus,accelerating reduction and scan shows great importance.However,the Von Neumann architecture suffers from performance and energy bottlenecks known as“memory wall”due to the unavoidable data migration.Recently,NVM(non-volatile memory)such as ReRAM(resistive random access memory),enables in-situ computing without data movement and its crossbar architecture can perform parallel GEMV(matrix-vector multiplication)operation naturally in one step.ReRAM-based architecture has demonstrated great success in many areas,e.g.accelerating machine learning and graph computing applications,etc.Parallel acceleration methods were proposed for reduction and scan primitives on ReRAM-based PIM(processing in memory)architecture,the computing process in terms of GEMV and the mapping method on the ReRAM crossbar were focused,and the co-design of software and hardware was realized to reduce power consumption and improve performance.Compared with GPU,the proposed reduction and scan algorithm achieved substantial speedup by two orders of magnitude,and the average acceleration ratio can also reach two orders of magnitude.The case of segmentation can achieve up to five(four on average)orders of magnitude.Meanwhile,the power consumption decreased by 79%.

关 键 词:规约 扫描 RERAM 存算一体架构 并行计算 

分 类 号:TN95[电子电信—信号与信息处理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象