基于异构平台的卷积神经网络加速系统设计  被引量:4

Design of convolutional neural network acceleration system based on heterogeneous platform

在线阅读下载全文

作  者:秦文强 吴仲城[2,3] 张俊 李芳 QIN Wen-qiang;WU Zhong-cheng;ZHANG Jun;LI Fang(Institute of Physical Science and Information Technology,Anhui University,Hefei 230601;Center for High Magnetic Field Science,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031;High Magnetic Field Laboratory of Anhui Province,Hefei 230031,China)

机构地区:[1]安徽大学物质科学与信息技术研究院,安徽合肥230601 [2]中国科学院合肥物质科学研究院强磁场科学中心,安徽合肥230031 [3]强磁场安徽省实验室,安徽合肥230031

出  处:《计算机工程与科学》2024年第1期12-20,共9页Computer Engineering & Science

基  金:中国科学院合肥大科学中心重点研发项目(2019HSC-KPRD003);合肥综合性国家科学中心项目(QGCYY04)。

摘  要:在计算和存储资源受限的嵌入式设备上部署卷积神经网络,存在执行速度慢、计算效率低、功耗高的问题。提出了一种基于异构平台的新型卷积神经网络加速架构,设计并实现了基于MobileNet的轻量化卷积神经网络加速系统。首先,为降低硬件资源消耗以及数据传输成本,采用动态定点数量化和批标准化融合的设计方法,对网络模型进行了优化,并降低了加速系统的硬件设计复杂度;其次,通过实现卷积分块、并行卷积计算、数据流优化,有效提高了卷积运算效率和系统吞吐率。在PYNQ-Z2平台上的实验结果表明,此加速系统实现的MobileNet网络推理加速方案对单幅图像的识别时间为0.18 s,系统功耗为2.62 W,相较于ARM单核处理器加速效果提升了128倍。Deploying convolutional neural networks(CNN)on embedded devices with limited computing and storage resources poses challenges such as slow execution speed,low computational efficiency,and high power consumption.This paper proposes a novel CNN acceleration architecture based on a heterogeneous platform,and designs and implements a lightweight CNN acceleration system based on MobileNet.Firstly,to reduce hardware resource consumption and data transmission costs,a design method combining dynamic fixed-point quantization and batch normalization fusion is employed to optimize the network model and reduce the hardware design complexity of the acceleration system.Secondly,by implementing convolutional block partitioning,parallel convolutional computation,and data flow optimization,the efficiency of convolutional operations and system throughput are effectively improved.Experimental results on the PYNQ-Z2 platform demonstrate that the MobileNet network inference acceleration scheme implemented by this acceleration system achieves a recognition time of 0.18 seconds per image and a system power consumption of 2.62 watts,representing a 128-fold improvement in acce-leration performance compared to an ARM single-core processor.

关 键 词:现场可编程门阵列(FPGA) Vivado高层次综合 卷积神经网络 异构平台 硬件加速 

分 类 号:TP368[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象