FAQ-CNN:面向量化卷积神经网络的嵌入式FPGA可扩展加速框架  被引量:8

FAQ-CNN: A Flexible Acceleration Framework for Quantized Convolutional Neural Networks on Embedded FPGAs

在线阅读下载全文

作  者:谢坤鹏 卢冶 靳宗明[1,2] 刘义情 龚成 陈新伟 李涛[1,2,3] Xie Kunpeng;Lu Ye;Jin Zongming;Liu Yiqing;Gong Cheng;Chen Xinwei;Li Tao(College of Computer Science,Nankai University,Tianjin 300350;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University),Tianjin 300350;State Key Laboratory of Com puter Architecture(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190;Fujian Provincial Key Laboratory of Information Processing and Intelligent Control(Minjiang University),Fuzhou 350108)

机构地区:[1]南开大学计算机学院,天津300350 [2]天津市网络与数据安全技术重点实验室(南开大学),天津300350 [3]计算机体系结构国家重点实验室(中国科学院计算技术研究所),北京100190 [4]福建省信息处理与智能控制重点实验室(闽江学院),福州350108

出  处:《计算机研究与发展》2022年第7期1409-1427,共19页Journal of Computer Research and Development

基  金:国家重点研发计划项目(2018YFB2100304);国家自然科学基金项目(62002175);计算机体系结构国家重点实验室(中国科学院计算技术研究所)开放课题(CARCHB202016);天津市优秀科技特派员项目(21YDTPJC00380);福建省信息处理与智能控制重点实验室(闽江学院)开放基金项目(MJUKF-IPIC202105);中国高校产学研创新基金项目(2020HYA01003)。

摘  要:卷积神经网络(convolutional neural network, CNN)模型量化可有效压缩模型尺寸并提升CNN计算效率.然而,CNN模型量化算法的加速器设计,通常面临算法各异、代码模块复用性差、数据交换效率低、资源利用不充分等问题.对此,提出一种面向量化CNN的嵌入式FPGA加速框架FAQ-CNN,从计算、通信和存储3方面进行联合优化,FAQ-CNN以软件工具的形式支持快速部署量化CNN模型.首先,设计面向量化算法的组件,将量化算法自身的运算操作和数值映射过程进行分离;综合运用算子融合、双缓冲和流水线等优化技术,提升CNN推理任务内部的并行执行效率.然后,提出分级编码与位宽无关编码规则和并行解码方法,支持低位宽数据的高效批量传输和并行计算.最后,建立资源配置优化模型并转为整数非线性规划问题,在求解时采用启发式剪枝策略缩小设计空间规模.实验结果表明,FAQ-CNN能够高效灵活地实现各类量化CNN加速器.在激活值和权值为16 b时,FAQ-CNN的加速器计算性能是Caffeine的1.4倍;在激活值和权值为8 b时,FAQ-CNN可获得高达1.23TOPS的优越性能.Quantization can compress convolutional neural network(CNN) model size and improve computing efficiency. However, the existing accelerator designs for CNN quantization are usually faced with the challenges of various algorithms, poor reusability of code modules, low efficiency of data exchange and insufficient utilization of resources, and so on. To meet these challenges, we propose a flexible acceleration framework for the quantized CNNs named FAQ-CNN to optimize accelerator design from three aspects of computing, communication and storage. FAQ-CNN can support rapid deployment of quantized CNN model in the form of software tools. Firstly, a component for quantization algorithms is designed to separate the calculation part from the process of value projection in quantization algorithm;the optimization techniques such as operator fusion, double buffering and pipeline are also utilized to improve the execution efficiency of CNN inference task in parallel. Then, the hierarchical and bitwidth-independent encoding and parallel decoding method are both proposed to efficiently support batch transmission and parallel computing for low bitwidth data. Finally, the resource allocation optimization model which can be transformed into an integer nonlinear programming problem is established for FAQ-CNN;the heuristic pruning strategy is used to reduce design space size. The extensive experimental results show that FAQ-CNN can support almost all kinds of quantized CNN accelerators efficiently and flexibly. When the activation and weight value are set to 16 b, the computing performance of FAQ-CNN accelerator is 1.4 times that of the Caffeine. When 8 b configuration is applied, FAQ-CNN can achieve the superior performance by 1.23 TOPS.

关 键 词:卷积神经网络量化 量化算法解耦 并行编解码 片上资源建模 加速器设计 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象