MRNDA:一种基于资源受限片上网络的深度神经网络加速器组播机制研究  

MRNDA:A Multicast Mechanism for Resource-Constrained Noc-Based Deep Neural Network Accelerators

在线阅读下载全文

作  者:欧阳一鸣[1] 王奇 汤飞扬 周武 李建华[1] OUYANG Yi-ming;WANG Qi;TANG Fei-yang;ZHOU Wu;LI Jian-hua(School of Computer Science and Information Engineering,Hefei University of Technology,Hefei,Anhui 230009,China;School of Microelectronics,Hefei University of Technology,Hefei,Anhui 230009,China)

机构地区:[1]合肥工业大学计算机与信息学院,安徽合肥230009 [2]合肥工业大学微电子学院,安徽合肥230009

出  处:《电子学报》2024年第3期872-884,共13页Acta Electronica Sinica

基  金:国家自然科学基金(No.61876158,No.71971151)。

摘  要:片上网络(Network-on-Chip,NoC)在多处理器系统中得到了广泛的应用.近年来,有研究提出了基于NoC的深度神经网络(Deep Neural Network,DNN)加速器.基于NoC的DNN加速器设计利用NoC连接神经元计算设备,能够极大地减少加速器对片外存储的访问从而减少加速器的分类延迟和功耗.但是,若采用传统的单播NoC,大量的一对多数据包会极大的提高加速器的通信延迟.并且,目前的深度神经网络规模往往非常庞大,而NoC的核心数量是有限的.因此,文中提出了一种针对资源受限的NoC的组播方案.该方案利用有限数量的处理单元(Processor Element,PE)来计算大型的DNN,并且利用特殊的树形组播加速网络来减少加速器的通信延迟.仿真结果表明,和基准情况相比,本文提出的组播机制使加速器的分类延迟最高降低了86.7%,通信延迟最高降低了88.8%,而它的路由器面积和功耗仅占基准路由器的9.5%和10.3%.Network-on-Chip(NoC)devices have been widely used in multiprocessor systems.In recent years,NoCbased deep neural network(DNN)accelerators have been proposed to connect neural computing devices using NoCs.Such designs dramatically reduce off-chip memory accesses of these platforms thus reduce the accelerators’classification latency and power consumption.However,the large number of one-to-many packet transfers significantly increase the communica⁃tion latency with traditional unicast channels.We proposed a multicast mechanism for resource-constrained noc-based deep neural network accelerators(MRNDA)to compute large DNN models by using limited number of processor elements(PEs).This paper proposes a tree-based multicast acceleration network to decrease the communication latency of DNN ac⁃celerators.Simulation results show that,compared with the baseline method,the multicast mechanism proposed in this pa⁃per reduces the classification latency of the accelerator by up to 86.7%and the communication latency by up to 88.8%,while its router’s area and power only account for 9.5%and 10.3%of the baseline routers.

关 键 词:片上网络 深度神经网络加速器 组播 路由器架构 多物理网络 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象