基于FPGA的Skynet网络结构优化及高时效实现  

Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA

在线阅读下载全文

作  者:唐维伟 钟胜[1,2] 卢金仪 颜露新[1,2] 谭富中 邹旭[1,2] 徐文辉[1,2] TANG Wei-wei;ZHONG Sheng;LU Jin-yi;YAN Lu-xin;TAN Fu-zhong;ZHOU Xu;XU Wen-hui(School of Artificial Intelligence and Automation,Huazhong University of Science and Technology,Wuhan,Hubei 430074,China;National Key Laboratory of Science&Technology on Multi-Spectral Information Processing,Huazhong University of Science and Technology,Wuhan,Hubei 430074,China)

机构地区:[1]华中科技大学人工智能与自动化学院,湖北武汉430074 [2]华中科技大学多谱信息处理技术国家级重点实验室,湖北武汉430074

出  处:《电子学报》2023年第2期314-323,共10页Acta Electronica Sinica

基  金:国家自然科学基金(No.61806081);国防基础科研计划资助(No.JCKY2018204B068)。

摘  要:基于卷积神经网络(Convolutional Neural Network,CNN)的目标检测算法有着鲁棒性强、准确度高等优点,被广泛用于计算机视觉任务领域.然而,CNN参数量大、计算量大的特性使得其难以在边缘计算平台实时实现,为此,本文针对目标检测网络Skynet进行结构优化,并基于高效的层内并行流水的加速架构,在现场可编程门阵列(Field Programmable Gate Array,FPGA)上对其进行实时实现.该方法对Skynet进行剪枝,合并其卷积层与归一化层,利用(Kullback-Leibler,KL)相对熵及极大值量化方法对权重及特征图进行8 bit定点量化,同时将偏置参数及缩放系数定点化,并合并激活操作与饱和截断操作,在减少存储量和计算量的同时,加快前向推理速度.此外,以滑窗操作为基础,采用通道及像素并行计算,设计深度可分离卷积的流水策略,将串行的前向推理结构优化为并行流水的结构,极大减少了前向推理的时间.实验表明,在UA-DETRAC数据集上,本文实现的系统识别精度为0.752,在160×160的图像分辨率上,速度达到115FPS,与CPU相比,提速11倍,达到了GPU的75%,功耗分别为CPU的10.6%,GPU的7.43%,而且,与同类基于FPGA的CNN加速工作相比,本文方法在速度和能效比上均表现最优.The object detection algorithm based on convolutional neural network(CNN)has the advantages of strong robustness and high accuracy,and is widely used in the field of computer vision tasks.However,the size of CNN parameters and the amount of calculation make it difficult to implement in real-time on edge computing platforms.For this reason,this paper optimizes the structure of the object detection network Skynet,and realizes on the field programmable logic gate array(FPGA)based on an efficient intra-layer parallel pipeline acceleration architecture.This method prunes skynet,merges its convolutional layer and normalization layer,uses the(KL)relative entropy method and maximum quantization method to perform 8 bit fixed-point quantization on the weights and feature maps,and converts bias and scaling coefficients into fixed point,then merges the activation operation and saturation truncation operation for speeding up the CNN forward calculation.In addition,this paper optimizes serial structure to pipeline parallel structure based on the sliding window operation,parallelizes channel and pixel calculation,then designs a pipeline strategy for depthwise separable convolution,which greatly reduces time to forward calculation.Experiments show that on the UA-DETRAC dataset,the method recognition accuracy of this paper is 0.752,and the frame rate reaches 115FPS at an image resolution of 160×160,which is 11 times faster than the CPU and reaches 75%of the GPU.The power is reduced to 10.6%of the CPU and 7.43%of the GPU.Moreover,the proposed method has the best performance in both speed and energy efficiency ratio by comparing with the similar CNN acceleration methods based on FPGA.

关 键 词:目标检测网络 定点量化 现场可编程门阵列 流水计算 skynet 

分 类 号:TN47[电子电信—微电子学与固体电子学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象