检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:画芊昊 李博[1] 杜宸罡 HUA Qianhao;LI Bo;DU Chengang(Key Laboratory of Instrumental Science and Dynamic Testing,Ministry of Education,North University of China,Taiyuan 030051,China)
机构地区:[1]中北大学仪器科学与动态测试教育部重点实验室,太原030051
出 处:《计算机测量与控制》2024年第5期267-273,共7页Computer Measurement &Control
摘 要:设计了一种基于FPGA的低功耗深度可分离卷积加速核;根据PW卷积和DW卷积计算中的共性,采用一种固定乘法阵列通过改变特征和权重输入数据流的方式实现两种卷积的计算结构,最大化DSP的利用率;针对8位非对称量化中符号位可能会溢出的问题,采用符号位单独处理的方法重新封装了双乘法器结构;通过层内7级流水结构保证每个周期数据处理的并行度;在Zynq UltraScale+系列FPGA上成功部署了加速结构;经实验测试,提出的加速结构在提高网络推理速度的同时降低了片上资源的依赖度和整体功耗,原生MobilenetV2在所提FPGA加速器上的平均吞吐率高达130.6 GOPS且整体功耗只有4.1 W,满足实时边缘计算的要求;相比其他硬件平台,能效比有明显提升;与FPGA上的同类型加速器相比,在性能密度(GOPS/LUT)、功率效率(GOPS/W)和DSP效率(GOPS/DSP)上均有优势。A low power deep separable convolution accelerator kernel based on FPGA is designed.According to the commonality of Pointwise(PW)convolution and Depthwise(DW)convolution calculations,the fixed multiplicative array is used to realize the two convolution calculation structures by changing the feature and weight input data stream,so as to maximize the utilization of DSP.In order to solve the problem that the sign bit may overflow in the 8-bit asymmetric quantization,the double multiplier structure is repackaged by using the sign bit processing method.The parallelism of data processing in each cycle is guaranteed by the 7-level pipelining structure in the layer.The accelerator structure is successfully deployed on the Zynq UltraScale+series FPGA;Through the experimental test,the results show that the proposed acceleration structure can improve the inference speed of network and reduce the dependence of on-chip resources and overall power consumption.The average throughput of the original MobilenetV2 on the proposed FPGA accelerator is as high as 130.6 GOPS,and the overall power consumption is only 4.1 w,which meets the requirements of real-time edge computing.Compared with other hardware platforms,the energy efficiency ratio is significantly improved;Compared with the same type of accelerator on the FPGA,it has advantages of performance density(GOPS/LUT),power efficiency(GOPS/W)and DSP efficiency(GOPS/DSP).
关 键 词:FPGA 硬件加速器 卷积神经网络 非对称量化 Mobilenet
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46