检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:焦禹铭 吴凯 郭风祥 王昭 宋庆增 JIAO Yuming;WU Kai;GUO Fengxiang;WANG Zhao;SONG Qingzeng(School of Computer Science and Technology,Tiangong University,Tianjin 300387,China;School of Electrical Engineering,Tiangong University,Tianjin300387,China;Information Science Academy,China Electronics Technology Group Corporation,Beijing 100086,China)
机构地区:[1]天津工业大学计算机科学与技术学院,天津300387 [2]天津工业大学电气工程学院,天津300387 [3]中国电子科技集团公司信息科学研究院,北京100086
出 处:《计算机应用》2022年第S01期208-214,共7页journal of Computer Applications
摘 要:不同框架深度学习模型部署是人工智能落地的核心,然而模型计算量和参数量过大、编程模型未统一导致了各种新型的专用卷积神经网络(CNN)加速器层出不穷,增加了模型的部署难度。对模型压缩和编译工具链这两个方面进行了改进:在模型压缩方面,提出新的通道剪枝标准,结合了通道的相关性和影响性以及输出通道对应的激活值,在保证精度的同时可以极大地削减卷积神经网络的计算量和参数量;在编译工具链方面,设计了一套自动的端到端优化堆栈,提出了针对基于现场可编程门阵列(FPGA)的深度学习编译器设计方法,并在中间表示中添加了所提出的排序标准的剪枝算法。实验结果表明,所设计的编译器于舰船目标检测的任务中,在通用设备上,保证精度损失不超过1%的情况下取得了1.3倍的加速效果;在专用的CNN加速器上取得了1.6倍的加速效果,在部署中能够有效地针对卷积网络进行加速。The deployment of deep learning models in different frameworks is deemed as the core of the implementation of artificial intelligence algorithms.However,various new-type special Convolutional Neural Network(CNN)accelerators emerge in endlessly caused by the oversize model calculation and parameter quantity and the inconsistent programming model,which has increased the difficulty of model deployment.The improvements has been done from two aspects:model compression and compilation tool chain.In terms of model compression,a new channel pruning standard was proposed,the correlation and influence of the channel were combined,and the activation value corresponding to the output channel was taken into account.It could greatly reduce the calculation and parameter amounts of convolutional neural network while ensuring the accuracy.In terms of compilation tool chain,a set of automatic end-to-end optimization stack was designed,a design method of deep learning complier based on Field Programmable Gate Array(FPGA)was proposed.Besides,the pruning algorithm with proposed sort standard was added to the intermediate representation.The experimental results show that in the task of ship target detection on general equipment,the designed compiler can achieve 1.3 times the acceleration effect while ensuring an accuracy loss of less than 1%.It can achieve 1.6 times the acceleration effect on the special CNN accelerator.In general,it can effectively accelerate the convolutional neural network in deployment.
关 键 词:现场可编程门阵列 模型压缩 深度学习编译器 中间表示 目标检测
分 类 号:TP314[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.139.86.227