基于CNN的异构FPGA硬件加速器设计  

Design of heterogeneous FPGA hardware accelerator based on CNN

作  者:籍浩林 徐伟[1,3] 朴永杰[1,3] 吴晓斌 高倓 JI Haolin;XU Wei;PIAO Yongjie;WU Xiaobin;GAO Tan(Changchun Institute of Optics,Fine Mechanics and Physics,Chinese Academy of Sciences,Changchun 130033,China;University of Chinese Academy of Sciences,Beijing 100049,China;Key Laboratory of Space-Based Dynamic&Rapid Optical Imaging Technology,Chinese Academy of Sciences,Changchun 130033,China)

机构地区:[1]中国科学院长春光学精密机械与物理研究所,吉林长春130033 [2]中国科学院大学,北京100049 [3]中国科学院天基动态快速光学成像技术重点实验室,吉林长春130033

出  处:《液晶与显示》2025年第3期448-456,共9页Chinese Journal of Liquid Crystals and Displays

基  金:钱学森空间技术实验室创新工作站开发基金(No.GZZKFJJ2020003)。

摘  要:受硬件平台算力以及存储资源的限制,利用嵌入式系统实现节能且高效的卷积神经网络(CNN)仍然是硬件设计人员面临的主要挑战。基于此,本文提出一种使用现场可编程门阵列片上系统(SoC)实现的异构嵌入式系统的完整设计。该设计采用了一种可级联的输入复用结构,同时在单个DSP中执行两个独立的乘法累加操作,在减少外部存储器的访问、提升系统效率的同时降低了功耗,相较于其他方案,其功率效率提升38.7%以上。该设计(框架)最终被成功部署于低成本设备上的大规模CNN网络,极大提升了网络模型的功率效率,基于ZYNQ XC7Z045设备上实现的功率效率甚至可达102 Gops/W。此外,当利用该框架进行VGG-16模型推断卷积层时,帧率可达10.9 fps,充分表明该设计在功率受限的环境中可以有效加速卷积神经网络的推理。Due to limitations in hardware platform computing power and storage resources,achieving energy-efficient and efficient convolutional neural networks(CNNs)by using embedded systems remains a primary challenge for hardware designers.In this context,a complete design of a heterogeneous embedded system implemented by using a system-on-chip(SoC)with a field-programmable gate array(FPGA)is proposed.This design adopts a cascaded input multiplexing structure,enabling two independent multiply-accumulate operations in a single DSP,reducing external memory access,enhancing system efficiency,and lowering power consumption.Compared to other designs,the power efficiency is improved by over 38.7%.The design framework is successfully deployed in a large-scale CNN network on low-cost devices,significantly improving power efficiency of the network model.The power efficiency achieved on the ZYNQ XC7Z045 device can even reach 102 Gops/W.Furthermore,when inferring the VGG-16’s CONV layers by using this framework,a frame rate of up to 10.9 fps is achieved,which demonstrates the framework’s effective acceleration of CNN inference in power-constrained environments.

关 键 词:硬件加速 卷积神经网络 FPGA 异构SoC 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP311[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象