检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:严飞[1,2] 郑绪文 孟川 李楚 刘银萍 YAN Fei;ZHENG Xuwen;MENG Chuan;LI Chu;LIU Yinping(School of Automation,Nanjing University of Information Science and Technology,Nanjing 210044,China;Jiangsu Collaborative Innovation Center for Atmospheric Environment and Equipment Technology,Nanjing 210044,China;School of Emergency Management,Nanjing University of Information Science and Technology,Nanjing 210044,China)
机构地区:[1]南京信息工程大学自动化学院,江苏南京210044 [2]江苏省大气环境与装备技术协同创新中心,江苏南京210044 [3]南京信息工程大学应急管理学院,江苏南京210044
出 处:《现代电子技术》2025年第1期151-156,共6页Modern Electronics Technique
基 金:国家自然科学基金项目(61605083);江苏省重点研发计划项目(BE2020006-2)。
摘 要:卷积神经网络是目标检测中的常用算法,但由于卷积神经网络参数量和计算量巨大导致检测速度慢、功耗高,且难以部署到硬件平台,故文中提出一种采用CPU与FPGA融合结构实现MobileNetV1目标检测加速的应用方法。首先,通过设置宽度超参数和分辨率超参数以及网络参数定点化来减少网络模型的参数量和计算量;其次,对卷积层和批量归一化层进行融合,减少网络复杂性,提升网络计算速度;然后,设计一种八通道核间并行卷积计算引擎,每个通道利用行缓存乘法和加法树结构实现卷积运算;最后,利用FPGA并行计算和流水线结构,通过对此八通道卷积计算引擎合理的复用完成三种不同类型的卷积计算,减少硬件资源使用量、降低功耗。实验结果表明,该设计可以对MobileNetV1目标检测进行硬件加速,帧率可达56.7 f/s,功耗仅为0.603 W。The convolutional neural network is a commonly used algorithm in object detection.However,due to the large number of parameters and computation load of the convolutional neural network(CNN),the detection speed of the CNN is slow,its power consumption is high,and it is difficult to deploy the CNN at the hardware platform.In view of this,the paper proposes an application method using the fusion structure of CPU and FPGA to realize the acceleration of MobileNetV1 object detection.The parameter number and computation load of the network model are reduced by setting width hyperparameters and resolution hyperparameters,as well as performing network parameter fixed-point.The convolutional layer and batch normalization layer are fused to reduce network complexity and improve network computation speed.An eight-channel inter-kernel parallel convolution engine is designed.Row cache multiplication and addition tree structure are used to implement convolution operation in each channel.Finally,by utilizing FPGA parallel computing and pipeline structure,three different types of convolution calculation are realized by reasonable reuse of the eight-channel convolution computing engine,so as to reduce the consumption of hardware resources and power consumption.The experimental results show that the design can accelerate the MobileNetV1 object detection with a frame rate of 56.7 f/s and a power consumption of 0.603 W.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49