一种面向卷积神经网络加速器的高性能乘累加器被引量：3

High performance multiply-accumulator for the convolutional neural networks accelerator

作　　者：孔鑫陈刚[1] 龚国良[1] 鲁华祥[1,2,3,4] 毛文宇 KONG Xin;CHEN Gang;GONG Guoliang;LU Huaxiang;Mao Wenyu(Institute of Semiconductors,Chinese Academy of Sciences,Beijing,100083,China;University of Chinese Academy of Sciences,Beijing,100089,China;Center of Excellence in Brain Science and Intelligence Technology,Chinese Academy of Sciences,Shanghai,200031,China;Semiconductor Neural Network Intelligent Perception and Computing Technology Beijing Key Lab,Beijing 100083,China)

机构地区：[1]中国科学院半导体研究所,北京100083 [2]中国科学院大学,北京100049 [3]中国科学院脑科学与智能技术卓越创新中心,上海200031 [4]半导体神经网络智能感知与计算技术北京市重点实验室,北京100083

出　　处：《西安电子科技大学学报》2020年第4期55-63,93,共10页Journal of Xidian University

基　　金：国家自然科学基金(U19A2080,U1936106,61701473);中国科学院战略性先导科技专项(A类)(XDA18040400);中国科学院STS项目(KFJ-STS-ZDTP-070);高技术项目(31513070501);北京市科技计划项目(Z181100001518006);科技创新特区项目(1916312ZD00902201)。

摘　　要：针对现有卷积神经网络加速器中的乘累加器普遍存在的面积大、功耗高、速度慢的问题,设计了一种基于传输门结构的全定制高性能乘累加器。提出了一种适用于乘累加器的新型累加数据压缩结构,减少了硬件开销;提出了一种新的并行加法器架构,在与Brent Kung加法器相同硬件开销的情况下,降低了门延迟级数,提高了计算速度;利用传输门的优点对乘累加器各单元电路进行优化设计。基于笔者方法设计的16乘8定点数高性能乘累加器在SMIC 130nm tt工艺角下关键路径延迟为1.173ns,版图面积为9049.41μm2,800MHz下平均功耗为4.153mW。对比传统的乘累加器,速度约提高了37.42%,面积约减小了47.87%,在同等条件下功耗约降低了56.77%。The multiply-accumulator(MAC)in existing convolutional neural network(CNN)accelerators generally have some problems,such as a large area,a high power consumption and a long critical path.Aiming at these problems,this paper presents a high-performance MAC based on transmission gates for CNN accelerators.This paper proposes a new data accumulation and compression structure suitable for the MAC,which reduces the hardware overhead.Moreover,we propose a new parallel adder architecture.Compared with the Brent Kung adder,the proposed adder reduces the number of gate delay stages and improves the calculation speed without causing an increase in hardware resources.In addition,we use the advantages of the transmission gate to optimize each unit circuit of the MAC.The 16-by-8 fixed-point high performance MAC based on the methods presented in this paper has a critical path delay of 1.173ns,a layout area of 9049.41μm2,and an average power consumption of 4.153mW at 800 MHz under the SMIC 130nm tt corner.Compared with the traditional MAC,the speed is increased by 37.42%,the area is reduced by 47.84%,and the power consumption is reduced by56.77%under the same conditions.

关键词：乘累加器传输门累加压缩卷积神经网络高性能

分类号：TN4[电子电信—微电子学与固体电子学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种面向卷积神经网络加速器的高性能乘累加器被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种面向卷积神经网络加速器的高性能乘累加器 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种面向卷积神经网络加速器的高性能乘累加器被引量：3