一种矩阵块间提前切换的脉动阵列优化策略  

A systolic array optimization strategy for switching matrix blocks in advance

在线阅读下载全文

作  者:鞠鑫 曹亚松 文梅[1] 汪志 冯静 JU Xin;CAO Ya-song;WEN Mei;WANG Zhi;FENG Jing(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

机构地区:[1]国防科技大学计算机学院,湖南长沙410073

出  处:《计算机工程与科学》2023年第1期1-9,共9页Computer Engineering & Science

基  金:国家自然科学基金(62002366)。

摘  要:AI应用对硬件算力的需求逐年增加,驱使着AI加速器不断向更高的性能演化。研究表明,AI应用的主要运算形式可以转化为矩阵乘运算,脉动阵列因为在矩阵乘运算上的独特优势,使其成为了主流矩阵乘加速技术之一。然而,矩阵在注入和流出脉动阵列时存在一定的流水线启动和排空开销,特别是支持训练的浮点脉动阵列,其MAC延时往往大于1,矩阵块间切换不及时会导致PE利用率急剧下降。针对上述问题,基于典型应用场景进行理论分析,提出了一种矩阵块间提前切换策略,能够精确计算出各种情况下的矩阵块间最优切换时刻。同时,还实现了RTL设计。经过实验对比可知,优化后的脉动阵列增加的硬件开销微乎其微,但在所有场景中均能得到性能提升。The demand for hardware computing power in AI applications increases year by year,driving the evolution of AI accelerators towards higher performance.Research shows that the main computing form of AI applications can be transformed into matrix multiplication,and systolic array has become one of the mainstream matrix multiplication acceleration technologies because of its unique advantages in matrix multiplication.However,there is a certain amount of pipeline filling and emptying overhead when the matrix is flowed into and out of the systolic array,especially for a floating-point systolic array that supports training,whose MAC latency is greater than 1.Untimely switching between matrix blocks will lead to a sharp drop in PE utilization.To solve these problems,theoretical analysis based on typical application scenarios is conducted,and an early switching strategy between matrix blocks is proposed,which can accurately calculate the optimal switching time between matrix blocks in various situations.The RTL design was implemented.The experimental results show that the hardware overhead of the optimized systolic array is slightly increased,but the performance can be improved in all scenarios.

关 键 词:脉动阵列 AI 矩阵乘 加速器 PE利用率 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象