检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赖嘉伟 魏洪健 孙科学[1,2] 王艳 LAI Jiawei;WEI Hongjian;SUN Kexue;WANG Yan(College of Electronic and Optical Engineering&College of Flexible Electronics(Future Technology),Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Nation-Local Joint Project Engineering Lab of RF Integration&Micropackage,Nanjing 210023,China)
机构地区:[1]南京邮电大学电子与光学工程学院、柔性电子(未来技术)学院,江苏南京210023 [2]射频集成与微组装技术国家地方联合工程实验室,江苏南京210023
出 处:《电子设计工程》2024年第17期16-21,共6页Electronic Design Engineering
基 金:江苏省研究生科研创新计划(SJCX22_0255)。
摘 要:针对传统卷积神经网络计算复杂度高,耗时较长,难以应用到嵌入式移动端的问题,提出了一种以ZYNQ芯片作为主控的FPAG联合ARM实现的的神经网络加速系统。该系统的PL部分采用纯RTL开发,对卷积层的输入层和输出层进行了全并行化,对卷积窗口进行完全的展开,在一个时钟周期内可以同时完成81次乘法运算,同时对池化层和全连接层采用流水线的优化方式。相比常用的使用高层次综合工具进行优化的方法,该系统使用RTL语言从零开始设计卷积神经网络各个模块,进行了细粒度的优化,避免了冗余逻辑资源的产生,充分利用了片上资源。针对MINIST手写数字识别的网络模型,该系统的DSP利用率达到了95%,在100 MHz时钟频率下,硬件单帧图像处理时间仅为0.81 ms,功耗仅为1.601 W。To address the problems of high computational complexity and time consuming application of traditional convolutional neural networks to embedded mobile,this paper proposes a neural network acceleration system based on the implementation of FPAG in conjunction with ARM with ZYNQ chip as the master control.The PL part of the system is developed in pure RTL,and the input and output layers of the convolutional layer are fully parallelized,the convolutional window is fully expanded,81 multiplications can be done simultaneously in one clock cycle,and the pooling and fully⁃connected layers are optimized in a fully pipelined way.Compared to the commonly used optimization methods using high⁃level synthesis tools,this system uses the RTL language to design each module of the convolutional neural network from scratch and performs fine⁃grained optimization to avoid the generation of redundant logic resources and make full use of on⁃chip resources.For the network model of MINIST handwritten digit recognition,the DSP utilization of this system reaches 95%,and the hardware single⁃frame image processing time is only 0.81 ms at a clock frequency of 100 MHz,and the power consumption is only 1.601 W.
关 键 词:PYNQ ARM处理器 神经网络 现场可编程门阵列 硬件加速器
分 类 号:TP274.2[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.158.138