检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:庞明义 魏祥麟 张云祥 王斌 庄建军 PANG Mingyi;WEI Xianglin;ZHANG Yunxiang;WANG Bin;ZHUANG Jianjun(School of Electronics and Information Engineering,Nanjing University of Information Science&Technology,Nanjing 211800,China;rd Research Institute,National University of Defense Technology,Nanjing 210007,China)
机构地区:[1]南京信息工程大学电子与信息工程学院,南京211800 [2]国防科技大学第六十三研究所,南京210007
出 处:《计算机科学》2025年第4期94-100,共7页Computer Science
摘 要:文中提出了一种面向非GPU类低资源芯片的自适应卷积神经网络加速器(Adaptive Convolutional Neural Network Accelerator,ACNNA),其可根据硬件平台资源约束和卷积神经网络结构自适应生成对应的硬件加速器。通过可重构特性,ACNNA可有效加速包括卷积层、池化层、激活层和全连接层在内的各种网络层组合。首先,设计了一种资源折叠式多通道处理引擎(Processing Engine,PE)阵列,将理想化卷积结构进行折叠以节省资源,在输出通道上展开以支持并行计算。其次,采用多级存储与乒乓缓存机制对流水线进行优化,有效提升数据处理效率。然后,提出了一种多级存储下的资源复用策略,结合设计空间探索算法,针对网络参数调度硬件资源分配,使低资源芯片可部署层次更深且参数更多的网络模型。以LeNet5和VGG16网络模型为例,在Ultra96 V2开发板上对ACNNA进行了验证。结果显示,采用ACNNA部署的VGG16最低仅消耗了原网络4%的资源量。在100MHz主频下,LeNet5加速器在2.05W的功耗下计算速率达0.37 GFLOPS;VGG16加速器在2.13W的功耗下计算速率达1.55 GFLOPS。与现有工作相比,所提方法的FPS提升超过83%。This paper proposes an adaptive convolutional neural network accelerator(ACNNA)for non-GPU chips with limited resources,which can adaptively generate hardware accelerators based on resource constraints of hardware platform and convolutional neural network structures.Through its reconfigurable feature,ACNNA can effectively accelerate various layer combinations including convolutional layers,pooling layers,activation layers,and fully connected layers.Firstly,a resource folding multi-channel processing engine(PE)array is designed,which folds the idealized convolutional structure to save resources and unfolds on the output channel to support parallel computing.Secondly,multi-level storage and ping-pong caching mechanisms are adopted to optimize the pipeline,effectively improving data processing efficiency.Then,a resource reuse strategy under multi-level storage is proposed,which combined with the design space exploration algorithm can more reasonably schedule hardware resource allocation for network parameters,so that low resource chips can deploy deeper and more parameterized network models.Taking LeNet5 and VGG16 network models as examples,this paper validate ACNNA on the Ultra96 V2 development board.The results show that the ACNNA deployment of VGG16 consumes only 4%of resources of original network.At 100MHz main frequency,LeNet5 accelerator achieves a computing rate of 0.37 GFLOPS with a power consumption of 2.05W;VGG16 accelerator has a computing speed of 1.55 GFLOPS at a power consumption of 2.132W.Compared with existing work,ACNNA increases Frames Per Second(FPS)by over 83%.
关 键 词:硬件加速 卷积神经网络 设计空间探索策略 现场可编程门阵列
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.158