一种基于PYNQ的神经网络加速系统  被引量:1

A neural network acceleration system based on PYNQ

在线阅读下载全文

作  者:赖嘉伟 魏洪健 孙科学[1,2] 王艳 LAI Jiawei;WEI Hongjian;SUN Kexue;WANG Yan(College of Electronic and Optical Engineering&College of Flexible Electronics(Future Technology),Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Nation-Local Joint Project Engineering Lab of RF Integration&Micropackage,Nanjing 210023,China)

机构地区:[1]南京邮电大学电子与光学工程学院、柔性电子(未来技术)学院,江苏南京210023 [2]射频集成与微组装技术国家地方联合工程实验室,江苏南京210023

出  处:《电子设计工程》2024年第17期16-21,共6页Electronic Design Engineering

基  金:江苏省研究生科研创新计划(SJCX22_0255)。

摘  要:针对传统卷积神经网络计算复杂度高,耗时较长,难以应用到嵌入式移动端的问题,提出了一种以ZYNQ芯片作为主控的FPAG联合ARM实现的的神经网络加速系统。该系统的PL部分采用纯RTL开发,对卷积层的输入层和输出层进行了全并行化,对卷积窗口进行完全的展开,在一个时钟周期内可以同时完成81次乘法运算,同时对池化层和全连接层采用流水线的优化方式。相比常用的使用高层次综合工具进行优化的方法,该系统使用RTL语言从零开始设计卷积神经网络各个模块,进行了细粒度的优化,避免了冗余逻辑资源的产生,充分利用了片上资源。针对MINIST手写数字识别的网络模型,该系统的DSP利用率达到了95%,在100 MHz时钟频率下,硬件单帧图像处理时间仅为0.81 ms,功耗仅为1.601 W。To address the problems of high computational complexity and time consuming application of traditional convolutional neural networks to embedded mobile,this paper proposes a neural network acceleration system based on the implementation of FPAG in conjunction with ARM with ZYNQ chip as the master control.The PL part of the system is developed in pure RTL,and the input and output layers of the convolutional layer are fully parallelized,the convolutional window is fully expanded,81 multiplications can be done simultaneously in one clock cycle,and the pooling and fully⁃connected layers are optimized in a fully pipelined way.Compared to the commonly used optimization methods using high⁃level synthesis tools,this system uses the RTL language to design each module of the convolutional neural network from scratch and performs fine⁃grained optimization to avoid the generation of redundant logic resources and make full use of on⁃chip resources.For the network model of MINIST handwritten digit recognition,the DSP utilization of this system reaches 95%,and the hardware single⁃frame image processing time is only 0.81 ms at a clock frequency of 100 MHz,and the power consumption is only 1.601 W.

关 键 词:PYNQ ARM处理器 神经网络 现场可编程门阵列 硬件加速器 

分 类 号:TP274.2[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象