一种节省资源的矩阵运算单元硬件微架构设计

Design of hardware microarchitecture of resource⁃efficient matrix operation unit

作　　者：潘于田映辉张伟杨建磊申奇 PAN Yu;TIAN Yinghui;ZHANG Wei;YANG Jianlei;SHEN Qi(Hygon Information Technology Co.,Ltd.,Beijing 100193,China;Beihang University,Beijing 100191,China;China Unicom Smart City Research Institute,Beijing 100037,China)

机构地区：[1]海光信息技术股份有限公司,北京100193 [2]北京航空航天大学,北京100191 [3]中国联通智能城市研究院,北京100037

出　　处：《现代电子技术》2024年第5期160-166,共7页Modern Electronics Technique

摘　　要：为了实现人工智能和高性能计算在不同应用领域下的快速运算,需借助人工智能加速器(NPU)或者通用图形处理器(GPGPU)对其进行加速。由于矩阵运算是人工智能和高性能计算的核心运算,文中提出一种节省资源的矩阵运算单元架构的实现方案。通过对矩阵运算单元中每个子运算单元中的乘法器和加法器数量进行扩展,并将输入数据按行列广播到矩阵运算单元上的各个子运算单元可实现对矩阵运算的加速。通过利用PE矩阵之间的数据共享,采用新型的PE矩阵互联方案,可达到在减少带宽资源的同时提升算力的目的。与现有NPU或GPGPU的矩阵运算实现方案相比,所提方案使用更少的加法器和寄存器即可实现相同的算力,且在更低的时钟延迟和带宽消耗下即可完成对相同规模矩阵运算的加速。It is necessary to use artificial intelligence accelerator NPU(neural processing unit)or GPGPU(general⁃purpose graphics processing unit)for acceleration,so as to realize the fast computation of artificial intelligence and high performance com⁃puting in different fields.Since the matrix operation is the core operation of artificial intelligence and high performance computing,an implementation scheme of resource⁃efficient matrix operation unit architecture is proposed.By expanding the number of multi⁃pliers and adders in each sub⁃unit of matrix arithmetic unit and broadcasting the input data to each sub⁃unit of matrix arithmetic unit by row and column,the acceleration of matrix arithmetic unit can be realized.By using the data sharing between PE matrix and adopting the new PE matrix interconnection scheme,the purpose of reducing bandwidth resources and increasing computing power can be achieved.In comparison with the existing implementation scheme of matrix operation of NPU or GPGPU,the pro⁃posed one can achieve the same computing power with fewer adders and registers,and can complete the acceleration of the same scale matrix operation with low clock latency and bandwidth consumption.

关键词：人工智能高性能计算矩阵运算节省资源低时钟延迟 GPGPU

分类号：TN02-34[电子电信—物理电子学] TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种节省资源的矩阵运算单元硬件微架构设计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种节省资源的矩阵运算单元硬件微架构设计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索