检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李宗凌 汪路元[1] 禹霁阳[1] 程博文[1] 郝梁 张伟功[2] LI Zong-ling;WANG Lu-yuan;YU Ji-yang;CHENG Bo-wen;HAO Liang;ZHANG Wei-gong(Institute of Spacecraft System Engineering,Beijing 100094,China;School of Information Engineering,Capital Normal University,Beijing 100048,China)
机构地区:[1]北京空间飞行器总体设计部,北京100094 [2]首都师范大学信息工程学院,北京100048
出 处:《计算机技术与发展》2019年第7期11-16,共6页Computer Technology and Development
基 金:国家自然科学基金(61472260)
摘 要:根据深度卷积神经网络(CNN)前向推理结构特点,设计了基于多并行计算和存储的深度卷积神经网络加速器,从运算效率与数据重用两个角度分析了卷积运算的并行特征,并研究了全连接层的全并行流水实现方式。该加速器采用并行流水结构提升计算效率,在卷积层运算中,充分利用多种卷积运算并行架构平衡运算效率与参数及数据载入带宽的需求,通过三种加速方式实现卷积层内全流水加速;在全连接层运算中,将乘累加运算设计成全流水处理架构,流水延时不超过20个处理时钟,并通过并行计算实现16倍加速。在基于ImageNet公开数据集验证实验中,该加速器每周期最多运行2304次乘累加运算,在150MHz的工作频率下,峰值运算速率达到691.2Gops,能效比为i7-6700-CPU的2700倍以上,为GTX-1050-GPU的290倍以上。该加速器在硬件资源、计算精度、速度以及功耗等多方面达到良好平衡,便于在星载嵌入式环境应用。According to the forward reasoning structure of the deep convolution neural network (CNN),a deep convolution neural network accelerator based on multi-parallel computation and storage is proposed and the parallel features of convolution operation are analyzed from two angles of operational efficiency and data reuse.The accelerator uses a parallel pipelining structure to improve operation efficiency,making full use of a variety of convolution computing parallel architectures to balance the operational efficiency and the demand for bandwidth of parameters and data,accelerating the whole flow in convolution layer by three level acceleration mode.In the full-connection layer operation,the multiplication and accumulation operation is designed as the full-pipeline processing operation.The pipeline delay does not exceed 20 processing clocks,and 16 times acceleration is realized by parallel computing.In the verification test based on ImageNet datasets,the accelerator runs 2 304 times per cycle by cumulative operation.At the working frequency of 150 MHz,the peak operation rate can reach 691.2 Gops,the energy efficiency ratio is more than 2 700 times that of i7-6700-CPU,which is more than 290 times of GTX-1050-GPU.The accelerator achieves a well balance in hardware resources,computing accuracy,speed and power consumption,and is easy to be used in spaceborne embedded environment.
关 键 词:卷积神经网络 并行计算和存储 加速器 VGG-16模型 现场可编程逻辑器件
分 类 号:TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.138.202.226