基于专用卷积神经网络加速器的编译器设计与实现被引量：1

Design and implementation of compiler based on special convolutional neural network accelerator

作　　者：焦禹铭吴凯郭风祥王昭宋庆增 JIAO Yuming;WU Kai;GUO Fengxiang;WANG Zhao;SONG Qingzeng(School of Computer Science and Technology,Tiangong University,Tianjin 300387,China;School of Electrical Engineering,Tiangong University,Tianjin300387,China;Information Science Academy,China Electronics Technology Group Corporation,Beijing 100086,China)

机构地区：[1]天津工业大学计算机科学与技术学院,天津300387 [2]天津工业大学电气工程学院,天津300387 [3]中国电子科技集团公司信息科学研究院,北京100086

出　　处：《计算机应用》2022年第S01期208-214,共7页journal of Computer Applications

摘　　要：不同框架深度学习模型部署是人工智能落地的核心,然而模型计算量和参数量过大、编程模型未统一导致了各种新型的专用卷积神经网络(CNN)加速器层出不穷,增加了模型的部署难度。对模型压缩和编译工具链这两个方面进行了改进:在模型压缩方面,提出新的通道剪枝标准,结合了通道的相关性和影响性以及输出通道对应的激活值,在保证精度的同时可以极大地削减卷积神经网络的计算量和参数量;在编译工具链方面,设计了一套自动的端到端优化堆栈,提出了针对基于现场可编程门阵列(FPGA)的深度学习编译器设计方法,并在中间表示中添加了所提出的排序标准的剪枝算法。实验结果表明,所设计的编译器于舰船目标检测的任务中,在通用设备上,保证精度损失不超过1%的情况下取得了1.3倍的加速效果;在专用的CNN加速器上取得了1.6倍的加速效果,在部署中能够有效地针对卷积网络进行加速。The deployment of deep learning models in different frameworks is deemed as the core of the implementation of artificial intelligence algorithms.However,various new-type special Convolutional Neural Network(CNN)accelerators emerge in endlessly caused by the oversize model calculation and parameter quantity and the inconsistent programming model,which has increased the difficulty of model deployment.The improvements has been done from two aspects:model compression and compilation tool chain.In terms of model compression,a new channel pruning standard was proposed,the correlation and influence of the channel were combined,and the activation value corresponding to the output channel was taken into account.It could greatly reduce the calculation and parameter amounts of convolutional neural network while ensuring the accuracy.In terms of compilation tool chain,a set of automatic end-to-end optimization stack was designed,a design method of deep learning complier based on Field Programmable Gate Array(FPGA)was proposed.Besides,the pruning algorithm with proposed sort standard was added to the intermediate representation.The experimental results show that in the task of ship target detection on general equipment,the designed compiler can achieve 1.3 times the acceleration effect while ensuring an accuracy loss of less than 1%.It can achieve 1.6 times the acceleration effect on the special CNN accelerator.In general,it can effectively accelerate the convolutional neural network in deployment.

关键词：现场可编程门阵列模型压缩深度学习编译器中间表示目标检测

分类号：TP314[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于专用卷积神经网络加速器的编译器设计与实现被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于专用卷积神经网络加速器的编译器设计与实现 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于专用卷积神经网络加速器的编译器设计与实现被引量：1