检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:谢帅 蒋力 叶瑶瑶 XIE Shuai;JIANG Li;YE Yaoyao(Department of Micro/Nano Electronics,Shanghai Jiao Tong University,Shanghai 200240,China;Department of Computer Science and Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)
机构地区:[1]上海交通大学微纳电子学系,上海200240 [2]上海交通大学计算机科学与工程系,上海200240
出 处:《微电子学与计算机》2021年第8期13-19,共7页Microelectronics & Computer
摘 要:目标检测任务对于检测任务精度和实时性都有很高要求,YOLOv3-tiny网络在这两点有很好的表现.但是其复杂的网络结构,使得实际应用需要从软件和硬件方面都进行针对性的优化.为了达到实时要求,综合使用三种优化技术:在软件层面,通过融合批归一层降低计算量,低位宽增大资源利用率;设计多维度并行FPGA计算核心匹配多个卷积层,提高整体吞吐率;细粒度层间流水和pingpong缓存设计,降低数据传输时间.在ZCU104型号的FPGA上,实现了418×418图片的21ms检测延时,超过同类加速器设计,并在DSP效率上有2.86倍或者8.81倍的提升.The YOLOv3-tiny network performs well in both accuracy and real-time for object detection.However,its complex network structure makes practical applications require targeted optimization from both software and hardware aspects.In order to meet the real-time requirements,three optimization techniques are used comprehensively.At the software level,the amount of computation is reduced through the fusion of batch normalization layer,while the low bit width to increase resource utilization.The multi-dimensional parallel FPGA computation cores are designed to match multiple convolutional layers to improve the overall throughput.Finegrained inter-layer flow and pingpong buffer design to reduce the data transfer time.With the ZCU104 model FPGA,it achieves a detection latency of 21ms for 418×418 images,which exceeds similar accelerator designs and improves the DSP efficiency by 2.86 times or 8.81 times.
关 键 词:YOLOv3-tiny FPGA加速器 多维度并行 低延时 高DSP效率
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222