检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:潘于 田映辉 张伟 杨建磊 申奇 PAN Yu;TIAN Yinghui;ZHANG Wei;YANG Jianlei;SHEN Qi(Hygon Information Technology Co.,Ltd.,Beijing 100193,China;Beihang University,Beijing 100191,China;China Unicom Smart City Research Institute,Beijing 100037,China)
机构地区:[1]海光信息技术股份有限公司,北京100193 [2]北京航空航天大学,北京100191 [3]中国联通智能城市研究院,北京100037
出 处:《现代电子技术》2024年第5期160-166,共7页Modern Electronics Technique
摘 要:为了实现人工智能和高性能计算在不同应用领域下的快速运算,需借助人工智能加速器(NPU)或者通用图形处理器(GPGPU)对其进行加速。由于矩阵运算是人工智能和高性能计算的核心运算,文中提出一种节省资源的矩阵运算单元架构的实现方案。通过对矩阵运算单元中每个子运算单元中的乘法器和加法器数量进行扩展,并将输入数据按行列广播到矩阵运算单元上的各个子运算单元可实现对矩阵运算的加速。通过利用PE矩阵之间的数据共享,采用新型的PE矩阵互联方案,可达到在减少带宽资源的同时提升算力的目的。与现有NPU或GPGPU的矩阵运算实现方案相比,所提方案使用更少的加法器和寄存器即可实现相同的算力,且在更低的时钟延迟和带宽消耗下即可完成对相同规模矩阵运算的加速。It is necessary to use artificial intelligence accelerator NPU(neural processing unit)or GPGPU(general⁃purpose graphics processing unit)for acceleration,so as to realize the fast computation of artificial intelligence and high performance com⁃puting in different fields.Since the matrix operation is the core operation of artificial intelligence and high performance computing,an implementation scheme of resource⁃efficient matrix operation unit architecture is proposed.By expanding the number of multi⁃pliers and adders in each sub⁃unit of matrix arithmetic unit and broadcasting the input data to each sub⁃unit of matrix arithmetic unit by row and column,the acceleration of matrix arithmetic unit can be realized.By using the data sharing between PE matrix and adopting the new PE matrix interconnection scheme,the purpose of reducing bandwidth resources and increasing computing power can be achieved.In comparison with the existing implementation scheme of matrix operation of NPU or GPGPU,the pro⁃posed one can achieve the same computing power with fewer adders and registers,and can complete the acceleration of the same scale matrix operation with low clock latency and bandwidth consumption.
关 键 词:人工智能 高性能计算 矩阵运算 节省资源 低时钟延迟 GPGPU
分 类 号:TN02-34[电子电信—物理电子学] TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49