检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王晓峰 蒋彭龙[1,2] 周辉[1,2] 赵雄波 WANG Xiaofeng;JIANG Penglong;ZHOU Hui;ZHAO Xiongbo(Beijing Aerospace Automatic Control Institute,Beijing 100854,China;National Key Laboratory of Science and Technology on Aerospace Intelligence Control,Beijing 100854,China)
机构地区:[1]北京航天自动控制研究所,北京100854 [2]宇航智能控制技术国家级重点实验室,北京100854
出 处:《计算机应用》2021年第3期812-819,共8页journal of Computer Applications
基 金:军队科研资助项目;中国运载火箭技术研究院创新研发项目。
摘 要:大多数基于卷积神经网络(CNN)的算法都是计算密集型和存储密集型的,很难应用于具有低功耗要求的航天、移动机器人、智能手机等嵌入式领域。针对这一问题,提出一种面向CNN的高并行度现场可编程逻辑门阵列(FPGA)加速器。首先,比较研究CNN算法中可用于FPGA加速的4类并行度;然后,提出多通道卷积旋转寄存流水(MCRP)结构,简洁有效地利用了CNN算法的卷积核内并行;最后,采用输入输出通道并行+卷积核内并行的方案提出一种基于MCRP结构的高并行度CNN加速器架构,并将其部署到XILINX的XCZU9EG芯片上,在充分利用片上数字信号处理器(DPS)资源的情况下,峰值算力达到2 304 GOPS。以SSD-300算法为测试对象,该CNN加速器的实际算力为1 830.33 GOPS,硬件利用率达79.44%。实验结果表明,MCRP结构可有效提高CNN加速器的算力,基于MCRP结构的CNN加速器可基本满足嵌入式领域大部分应用的算力需求。Most of the algorithms based on Convolutional Neural Network(CNN)are computation-intensive and memory-intensive,so they are difficult to be applied in embedded fields such as aerospace,mobile robotics and smartphones which have low-power requirements.To solve this problem,a Field Programmable Gate Array(FPGA)accelerator with high parallelism for CNN was proposed.Firstly,four kinds of parallelism in CNN algorithm that can be used for FPGA acceleration were compared and studied.Then,a Multi-channel Convolutional Rotating-register Pipeline(MCRP)structure was proposed to concisely and effectively utilize the convolution kernel parallelism of CNN algorithm.Finally,using the strategy of input/output channel parallelism+convolution kernel parallelism,a CNN accelerator architecture with high parallelism was proposed based on MCRP structure,and to verify the design rationality of the architecture,it was deployed on the XCZU9EG chip of XILINX.Under the condition of making full use of the on-chip Digital Signal Processor(DSP)resources,the peak computing capacity of the proposed CNN accelerator reached 2304 GOPS(Giga Operations Per Second).Taking SSD-300 algorithm as the test object,this CNN accelerator had the actual computing capacity of 1830.33 GOPS,and the hardware utilization rate of 79.44%.Experimental results show that,the MCRP structure can effectively improve the computing capacity of CNN accelerator,and the CNN accelerator based on MCRP structure can generally meet the computing capacity requirements of most applications in the embedded fields.
关 键 词:卷积神经网络 高性能 硬件加速器 并行度 现场可编程逻辑门阵列
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3