检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:余运俊[1] 张鹏飞 龚汉城 陈敏 YU Yunjun;ZHANG Pengfei;GONG Hancheng;CHEN Min(School of Information Engineering,Nanchang University,Nanchang 330000,China;Jiangxi Jiangtou Digital Economy Research Institute,Nanchang 330000,China)
机构地区:[1]南昌大学信息工程学院,南昌330000 [2]江西江投数字经济研究院,南昌330000
出 处:《计算机科学》2023年第S02期820-826,共7页Computer Science
基 金:国家国际科技合作专项(2014DFG72240);江西省重点研发计划项目(20214BBG74006)。
摘 要:随着边缘设备数据的增多和神经网络的不断落地应用,边缘计算为以云计算为核心的大数据技术分担了压力。现场可编程门阵列(FPGA)因灵活的体系结构和低功耗,在边缘计算以及构建神经网络加速器中显示出优异的特性。但是,传统的基于传统卷积算法的FPGA解决方案往往受到片上计算单元数量的限制。使用Zynq作为硬件加速平台,对参数进行定点量化,利用数组分区提高流水线运行速度。采用Winograd快速卷积算法对传统的卷积进行改进,将卷积运算中的乘法运算转换为加法运算,降低了模型的计算复杂度,极大提高了所设计的加速器的计算性能。实验表明,XC7Z035工作在150MHz时钟下获得了43.5GOP/s的性能,能效是Xeon(R)Silver 4214R的129倍,是双核ARM的159倍。所提方案在资源和功耗受限的情况下可以提供较高的性能,适用于网络边缘端对轻量级神经网络的落地应用。With the increase of edge device data and the continuous application of neural networks,the rise of edge computing has shared the pressure on big data technologies with cloud computing as the core.Field programmable gate arrays(FPGAs)have shown excellent properties in edge computing and building neural network accelerators due to their flexible architecture and low power consumption.But traditional FPGA solutions based on traditional convolution algorithms are often limited by the number of on-chip computing units.In this paper,Zynq is used as a hardware acceleration platform,to quantize parameters at a fixed point,and array partitioning is used to improve pipeline running speed.The Winograd fast convolution algorithm is used to improve the traditional convolution,and the multiplication operation in the convolution operation is converted into an addition operation,which reduces the computational complexity of the model.The computational performance of the designed accelerator is greatly improved.Experiments show that XC7Z035 can achieve 43.5GOP/s performance under 150 MHz clock,and the energy efficiency is 129 times of Xeon(R)Silver 4214R and 159 timesof dual-core ARM.The proposedsolution is limited in resources and power consumption.It can provide high performance and is suitable for the landing application of lightweight neural networks at the edge of the network.
关 键 词:边缘计算 硬件加速 轻量级卷积神经网络 Winograd FPGA
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7