一种指令集控制的神经网络加速器设计  

A programmable accelerator for convolution neural network

在线阅读下载全文

作  者:刘偲旸 蒋剑飞[1] 毛志刚[1] LIU Si-yang;JIANG Jian-fei;MAO Zhi-gang(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)

机构地区:[1]上海交通大学电子信息与电气工程学院,上海200240

出  处:《微电子学与计算机》2021年第5期1-6,共6页Microelectronics & Computer

摘  要:随着深度学习和神经网络技术的发展,为了充分挖掘卷积神经网络(CNN)计算的并行性,硬件加速器以其高速、低成本、高容错能力等特点得到更加广泛的应用.本文提出了一种可以逐层优化CNN网络的新算法,设计了对应的指令集.所提出的算法可用于为具有特定计算资源和存储资源的不同网络找到最佳加速方案.在优化过程中,可以将不同类型的数据量化为半精度以减少内存访问.基于40 nm CMOS工艺和提出的算法,完成了一种指令集控制的神经网络加速器设计.该加速器在200 MHz的工作频率下,峰值性能可达到416 GOP/s.在设计的加速器上实现了VGG16网络的推理过程,整个网络的延迟仅为116毫秒.In order to fully explore the parallelism of convolutional neural network(CNN)computing,hardware accelerators are more attractive for their characteristics of high speed,low cost and high fault tolerance.A novel algorithm that can optimize the CNN network layer by layeris proposed,and the corresponding instruction set is designedinthis paper.The proposed algorithm can be used to find an optimal acceleration scheme for differ-ent networks with specific computing and storage resources.In the optimization process,different types of data can be quantized to half-precision to reduce memory access.Based on the 40 nm CMOS process and the proposed algorithm,aprogrammable accelerator for CNN is designed,which can achieve peak performance of 416 GOP/s under 200 MHz working frequency.VGG is implemented on our accelerator as a case study,and the latency of the total network is 116 ms.

关 键 词:加速器 指令集 动态精度量化 卷积神经网络 

分 类 号:TN722.1[电子电信—电路与系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象