检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙明 陈昕[1] SUN Ming;CHEN Xin(School of Electronics and Information Engineering,Tongji University,Shanghai 201804,China)
机构地区:[1]同济大学电子与信息工程学院,上海201804
出 处:《计算机工程与应用》2021年第13期77-84,共8页Computer Engineering and Applications
基 金:国家级创新创业训练计划项目(20160295077)。
摘 要:为满足实际应用对卷积神经网络(CNN)推理的低时延、小体积和高吞吐率等要求,设计了一个采用如下优化方法的加速器:针对外存访问带宽限制,基于设计空间探索确定循环分块因子以最大化数据重用;针对CNN计算密度高,采用循环展开技术充分挖掘四种计算并行度;内存池、乒乓缓存和动态数据量化等技术用于管理片内外存储资源。将生成加速器流程封装为CNN加速框架;采用生成的加速器实现了AlexNet网络,仿真结果表明,该设计最高可达1 493.4 Gops的计算峰值,是被比较工作的多达24.2倍,DSP效率也超过了其他设计方法,最低为1.2倍,实现了CNN快速部署,开发效率高,加速性能优异。In order to meet the requirements of low latency,small size and high throughput for Convolutional Neural Network(CNN)inference in practical applications,an accelerator is designed that uses the following optimization methods:for the storage access bandwidth limitation,the loop tiling factor is determined based on the design space exploration to improve the degree of data reuse;for the high computation density of CNN,it uses the loop unrolling technology to fully exploit the four kinds of computing parallelism;technologies such as memory pool,ping-pong cache,and dynamic data quantization are used to manage on-chip and off-chip storage resources.In addition,the process of generating accelerators is packaged as a CNN acceleration framework.Finally,the generated accelerator is used to implement the AlexNet net-work,the simulation results show that maximum computing throughput of this design is 1,493.4 Gops,which is up to 24.2 times of compared works,DSP efficiency exceeds other design methods,the lowest is 1.2 times.This paper achieves the rapid deployment of CNN,high development efficiency,and excellent acceleration performance.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.175