混合精度频域卷积神经网络FPGA加速器设计被引量：1

FPGA Accelerator Design for Hybrid Precision Frequency Domain Convolutional Neural Network

作　　者：陈逸刘博生徐永祺武继刚[1] CHEN Yi;LIU Bosheng;XU Yongqi;WU Jigang(School of Computer Science and Technology,Guangdong University of Technology,Guangzhou 510006,China)

机构地区：[1]广东工业大学计算机学院,广州510006

出　　处：《计算机工程》2023年第12期1-9,共9页Computer Engineering

基　　金：国家自然科学基金(62072118)。

摘　　要：深度卷积神经网络具有模型大、计算复杂度高的特点,难以部署到硬件资源有限的现场可编程门阵列(FPGA)中。混合精度卷积神经网络可在模型大小和准确率之间做出权衡,从而为降低模型内存占用提供有效方案。快速傅里叶变换作为一种快速算法,可将传统空间域卷积神经网络变换至频域,从而有效降低模型计算复杂度。提出一个基于FPGA的8 bit和16 bit混合精度频域卷积神经网络加速器设计。该加速器支持8 bit和16 bit频域卷积的动态配置,并可将8 bit频域乘法运算打包以复用DSP,用来提升计算性能。首先设计一个基于DSP的频域计算单元,支持8 bit和16 bit频域卷积运算,通过打包一对8 bit频域乘法以复用DSP,从而提升吞吐率。然后提出一个映射数据流,该数据流支持8 bit和16 bit计算两种形式,通过数据重用方式最大化减少冗余数据处理和数据搬运操作。最后使用ImageNet数据集,基于ResNet-18与VGG16模型对所设计的加速器进行评估。实验结果表明,该加速器的能效比(GOP与能耗的比值)在ResNet-18和VGG16模型上分别达到29.74和56.73,较频域FPGA加速器提升1.2~6.0倍。Deep Convolutional Neural Network(CNN)have large models and high computational complexity,making their deployment in Programmable Gate Array(FPGA)with limited hardware resources difficult.Hybrid precision CNNs can provide an effective trade-off between model size and accuracy,thus providing an efficient solution for reducing the model's memory footprint.As a fast algorithm,the Fast Fourier Transform(FFT)can convert traditional spatial domain CNNs into the frequency domain,effectively reducing the computational complexity of the model.This study presents an FPGA-based accelerator design for 8 bit and 16 bit hybrid precision frequency domain CNNs that supports the dynamic configuration of 8 bit and 16 bit frequency domain convolutions and can pack 8 bit frequency domain multiplication operations to enable the reuse of DSPs for performance improvement.A DSP-based Frequency-domain Processing Element(FPE)is designed to support 8 bit and 16 bit frequency domain convolution operations.It can pack a couple of 8 bit frequency domain multiplications to reuse DSPs to boost throughput.In addition,a mapping dataflow that supports both 8 bit and 16 bit computation patterns and can maximize the reduction of redundant data processing and data movement through data reuse is proposed.The proposed accelerator is evaluated based on the ResNet-18 and VGG16 models using the ImageNet dataset.The experimental results reveal that the proposed model can achieve 29.74 and 56.73 energy efficiency ratio(ratio of GOP to energy consumption)on the ResNet-18 and VGG16 models,respectively,which is 1.2-6.0 times better than those of frequency domain FPGA accelerators.

关键词：卷积神经网络硬件加速器频域混合精度现场可编程门阵列

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

混合精度频域卷积神经网络FPGA加速器设计被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

混合精度频域卷积神经网络FPGA加速器设计 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

混合精度频域卷积神经网络FPGA加速器设计被引量：1