检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘偲旸 蒋剑飞[1] 毛志刚[1] LIU Si-yang;JIANG Jian-fei;MAO Zhi-gang(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)
机构地区:[1]上海交通大学电子信息与电气工程学院,上海200240
出 处:《微电子学与计算机》2021年第5期1-6,共6页Microelectronics & Computer
摘 要:随着深度学习和神经网络技术的发展,为了充分挖掘卷积神经网络(CNN)计算的并行性,硬件加速器以其高速、低成本、高容错能力等特点得到更加广泛的应用.本文提出了一种可以逐层优化CNN网络的新算法,设计了对应的指令集.所提出的算法可用于为具有特定计算资源和存储资源的不同网络找到最佳加速方案.在优化过程中,可以将不同类型的数据量化为半精度以减少内存访问.基于40 nm CMOS工艺和提出的算法,完成了一种指令集控制的神经网络加速器设计.该加速器在200 MHz的工作频率下,峰值性能可达到416 GOP/s.在设计的加速器上实现了VGG16网络的推理过程,整个网络的延迟仅为116毫秒.In order to fully explore the parallelism of convolutional neural network(CNN)computing,hardware accelerators are more attractive for their characteristics of high speed,low cost and high fault tolerance.A novel algorithm that can optimize the CNN network layer by layeris proposed,and the corresponding instruction set is designedinthis paper.The proposed algorithm can be used to find an optimal acceleration scheme for differ-ent networks with specific computing and storage resources.In the optimization process,different types of data can be quantized to half-precision to reduce memory access.Based on the 40 nm CMOS process and the proposed algorithm,aprogrammable accelerator for CNN is designed,which can achieve peak performance of 416 GOP/s under 200 MHz working frequency.VGG is implemented on our accelerator as a case study,and the latency of the total network is 116 ms.
分 类 号:TN722.1[电子电信—电路与系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.67.245