检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:谭会生[1] 肖鑫凯 卿翔 Tan Huisheng;Xiao Xinkai;Qing Xiang(College of Railway Transportation,Hunan University of Technology,Zhuzhou 412000,China)
机构地区:[1]湖南工业大学轨道交通学院,湖南株洲412000
出 处:《半导体技术》2025年第1期55-63,共9页Semiconductor Technology
基 金:湖南省学位与研究生教学改革研究项目(2022JGYB183)。
摘 要:为解决在嵌入式设备中部署神经网络受算法复杂度、执行速度和硬件资源约束的问题,基于Zynq异构平台,设计了一个高性能的YOLOv3-tiny网络硬件加速器。在算法优化方面,将卷积层和批归一化层融合,使用8 bit量化算法,简化了算法流程;在加速器架构设计方面,设计了可动态配置的层间流水线和高效的数据传输方案,缩短了推理时间,减小了存储资源消耗;在网络前向推理方面,针对卷积计算,基于循环展开策略,设计了8通道并行流水的卷积模块;针对池化计算,采用分步计算策略实现对连续数据流的高效处理;针对上采样计算,提出了基于数据复制的2倍上采样方法。实验结果表明,前向推理时间为232 ms,功耗仅为2.29 W,系统工作频率为200 MHz,达到了23.97 GOPS的实际算力。To solve the problem that the deployment of neural network in embedded devices is constrained by algorithm complexity,execution speed and hardware resources,a high performance YOLOv3-tiny network hardware accelerator was designed based on Zynq heterogeneous platform.In terms of algorithm optimization,the convolutional layer and batch normalization layer were fused,and the 8 bit quantization algorithm was used to simplify the algorithm process.In the accelerator architecture design,a dynamically configurable inter-layer pipeline and an efficient data transmission scheme were designed to shorten the inference time and reduce the consumption of storage resources.In the aspect of network forward inference,for convolution calculation,an 8-channel parallel pipeline convolution module was designed based on the loop unrolling strategy.For pooling calculation,a step-by-step calculation strategy was used to achieve efficient processing of continuous data streams.For the upsampling computation,a 2x upsampling method based on data replication was proposed.Experimental results show that the forward inference time is 232 ms,the power consumption is only 2.29 W,the system operating frequency is 200 MHz,and the actual computing power of 23.97 GOPS is achieved.
关 键 词:YOLOv3-tiny网络 异构平台 硬件加速器 动态配置架构 硬件混合优化 数据复制上采样
分 类 号:TN79[电子电信—电路与系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.255.53