针对实时目标检测的多维度并行FPGA加速器设计  被引量:2

Multidimensional parallel FPGA accelerator design for real-time object detection

在线阅读下载全文

作  者:谢帅 蒋力 叶瑶瑶 XIE Shuai;JIANG Li;YE Yaoyao(Department of Micro/Nano Electronics,Shanghai Jiao Tong University,Shanghai 200240,China;Department of Computer Science and Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)

机构地区:[1]上海交通大学微纳电子学系,上海200240 [2]上海交通大学计算机科学与工程系,上海200240

出  处:《微电子学与计算机》2021年第8期13-19,共7页Microelectronics & Computer

摘  要:目标检测任务对于检测任务精度和实时性都有很高要求,YOLOv3-tiny网络在这两点有很好的表现.但是其复杂的网络结构,使得实际应用需要从软件和硬件方面都进行针对性的优化.为了达到实时要求,综合使用三种优化技术:在软件层面,通过融合批归一层降低计算量,低位宽增大资源利用率;设计多维度并行FPGA计算核心匹配多个卷积层,提高整体吞吐率;细粒度层间流水和pingpong缓存设计,降低数据传输时间.在ZCU104型号的FPGA上,实现了418×418图片的21ms检测延时,超过同类加速器设计,并在DSP效率上有2.86倍或者8.81倍的提升.The YOLOv3-tiny network performs well in both accuracy and real-time for object detection.However,its complex network structure makes practical applications require targeted optimization from both software and hardware aspects.In order to meet the real-time requirements,three optimization techniques are used comprehensively.At the software level,the amount of computation is reduced through the fusion of batch normalization layer,while the low bit width to increase resource utilization.The multi-dimensional parallel FPGA computation cores are designed to match multiple convolutional layers to improve the overall throughput.Finegrained inter-layer flow and pingpong buffer design to reduce the data transfer time.With the ZCU104 model FPGA,it achieves a detection latency of 21ms for 418×418 images,which exceeds similar accelerator designs and improves the DSP efficiency by 2.86 times or 8.81 times.

关 键 词:YOLOv3-tiny FPGA加速器 多维度并行 低延时 高DSP效率 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象