面向低资源芯片的高效自适应卷积神经网络加速器

Efficient Adaptive CNN Accelerator for Resource-limited Chips

作　　者：庞明义魏祥麟张云祥王斌庄建军 PANG Mingyi;WEI Xianglin;ZHANG Yunxiang;WANG Bin;ZHUANG Jianjun(School of Electronics and Information Engineering,Nanjing University of Information Science&Technology,Nanjing 211800,China;rd Research Institute,National University of Defense Technology,Nanjing 210007,China)

机构地区：[1]南京信息工程大学电子与信息工程学院,南京211800 [2]国防科技大学第六十三研究所,南京210007

出　　处：《计算机科学》2025年第4期94-100,共7页Computer Science

摘　　要：文中提出了一种面向非GPU类低资源芯片的自适应卷积神经网络加速器(Adaptive Convolutional Neural Network Accelerator,ACNNA),其可根据硬件平台资源约束和卷积神经网络结构自适应生成对应的硬件加速器。通过可重构特性,ACNNA可有效加速包括卷积层、池化层、激活层和全连接层在内的各种网络层组合。首先,设计了一种资源折叠式多通道处理引擎(Processing Engine,PE)阵列,将理想化卷积结构进行折叠以节省资源,在输出通道上展开以支持并行计算。其次,采用多级存储与乒乓缓存机制对流水线进行优化,有效提升数据处理效率。然后,提出了一种多级存储下的资源复用策略,结合设计空间探索算法,针对网络参数调度硬件资源分配,使低资源芯片可部署层次更深且参数更多的网络模型。以LeNet5和VGG16网络模型为例,在Ultra96 V2开发板上对ACNNA进行了验证。结果显示,采用ACNNA部署的VGG16最低仅消耗了原网络4%的资源量。在100MHz主频下,LeNet5加速器在2.05W的功耗下计算速率达0.37 GFLOPS;VGG16加速器在2.13W的功耗下计算速率达1.55 GFLOPS。与现有工作相比,所提方法的FPS提升超过83%。This paper proposes an adaptive convolutional neural network accelerator(ACNNA)for non-GPU chips with limited resources,which can adaptively generate hardware accelerators based on resource constraints of hardware platform and convolutional neural network structures.Through its reconfigurable feature,ACNNA can effectively accelerate various layer combinations including convolutional layers,pooling layers,activation layers,and fully connected layers.Firstly,a resource folding multi-channel processing engine(PE)array is designed,which folds the idealized convolutional structure to save resources and unfolds on the output channel to support parallel computing.Secondly,multi-level storage and ping-pong caching mechanisms are adopted to optimize the pipeline,effectively improving data processing efficiency.Then,a resource reuse strategy under multi-level storage is proposed,which combined with the design space exploration algorithm can more reasonably schedule hardware resource allocation for network parameters,so that low resource chips can deploy deeper and more parameterized network models.Taking LeNet5 and VGG16 network models as examples,this paper validate ACNNA on the Ultra96 V2 development board.The results show that the ACNNA deployment of VGG16 consumes only 4%of resources of original network.At 100MHz main frequency,LeNet5 accelerator achieves a computing rate of 0.37 GFLOPS with a power consumption of 2.05W;VGG16 accelerator has a computing speed of 1.55 GFLOPS at a power consumption of 2.132W.Compared with existing work,ACNNA increases Frames Per Second(FPS)by over 83%.

关键词：硬件加速卷积神经网络设计空间探索策略现场可编程门阵列

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向低资源芯片的高效自适应卷积神经网络加速器

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向低资源芯片的高效自适应卷积神经网络加速器

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索