面向CNN卷积层硬件的计算资源优化设计

Optimal design of computing resources for CNN convolution layer hardware

作　　者：王彬燏杨志家谢闯[2,3] 连莲王颖[1,2,3] WANG Binyu;YANG Zhijia;XIE Chuang;LIAN Lian;WANG Ying(College of Information Engineering,Shenyang University of Chemical Technology,Shenyang 110142,China;Key Laboratory of Networked Control Systems,Chinese Academy of Sciences,Shenyang 110016,China;Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China)

机构地区：[1]沈阳化工大学信息工程学院,辽宁沈阳110142 [2]中国科学院网络化控制系统重点实验室,辽宁沈阳110016 [3]中国科学院沈阳自动化研究所,辽宁沈阳110016

出　　处：《微电子学与计算机》2024年第7期89-95,共7页Microelectronics & Computer

基　　金：国家重点研发计划(2022YFB3204501)。

摘　　要：传统卷积神经网络(Convolutional Neural Network,CNN)专用加速器在实现卷积层算子重构、数据复用和计算资源复用时,会产生硬件资源利用率较低的问题。对此设计了一种基于动态寄存器堆和可重构PE阵列相结合的硬件架构,通过优化数据流使得各PE单元负载均衡,进而提高卷积层计算资源的利用率。可灵活部署0~11大小和1~10步长的奇数卷积核,支持多通道并行卷积、输入数据复用等操作。设计使用verilog硬件描述语言实现,通过创建UVM环境进行功能性验证。实验表明:在加速AlexNet模型的卷积层时,峰值算力的吞吐率相比于相关研究提高了9.5%~64.3%,在映射5种经典神经网络里不同尺寸大小和步长的卷积核时,PE单元的平均利用率相比于相关研究提高了4%~11%。The traditional Convolutional Neural Network(CNN)dedicated accelerator will produce the low hardware resource utilization problem when realizing the convolution layer operator reconstruction,data multiplexing and computational resource reuse.A hardware architecture based on the combination of dynamic Register file and reconfigurable PE array is designed to balance the load of each PE unit by optimizing the data stream,thus improving the utilization of computing resources in the convolution layer.It can flexibly deploy odd convolution kernel with 0 to 11 size and 1 to 10 step length,and support multi-channel parallel convolution and input data multiplexing operations.The design is implemented using verilog hardware description language,and functional verification is carried out by creating UVM environment.The experiments show that when accelerating the convolutional layer of the AlexNet model,the throughput of peak computing power is increased by 9.5%to 64.3%compared with relevant studies.When mapping convolutional kernels of different sizes and steps in five classical neural networks,the average utilization rate of PE units is increased by 4%to 11%compared with relevant studies.

关键词：可重构PE 动态寄存器堆灵活性资源利用率

分类号：TN492[电子电信—微电子学与固体电子学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向CNN卷积层硬件的计算资源优化设计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向CNN卷积层硬件的计算资源优化设计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索