基于Soft-NMS的候选框去冗余加速器设计  被引量:8

A redundacy-reduced candidate box acceleratorbased on soft-non-maximum suppression

在线阅读下载全文

作  者:李景琳 姜晶菲[1] 窦勇[1] 许金伟 温冬 LI Jing-lin;JIANG Jing-fei;DOU Yong;XU Jin-wei;WEN Dong(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

机构地区:[1]国防科技大学计算机学院,湖南长沙410073

出  处:《计算机工程与科学》2021年第4期586-593,共8页Computer Engineering & Science

基  金:国家核高基重大专项(2018ZX01028101)。

摘  要:目标检测任务通常使用非极大值抑制算法(NMS)删除卷积神经网络输出的冗余候选框。Soft-NMS使用逐步衰减候选框得分值的方法代替Hard-NMS中直接删除大于预定义阈值候选框的方法,可以避免误删图像中重叠的目标候选框,提高目标检测任务的准确率。但是,频繁地改变候选框得分值使得Soft-NMS较Hard-NMS更为复杂,为了实现高准确率、低延时、低功耗的候选框去冗余效果,提出一种基于Soft-NMS的体系结构,利用对数函数优化复杂的浮点计算,细粒度流水和粗粒度并行组成2级优化结构进一步提升算法的吞吐率。在XILINX KU-115 FPGA开发板上对该体系结构进行了评估,评估结果表明,该体系结构的功耗为6.107 W,处理992个候选框的延时为168.95μs,与CPU实现的Soft-NMS相比,该体系结构实现了36倍的性能提升,性能功耗比为CPU实现的264倍。Object detection tasks usually use the non-maximum suppression algorithm(NMS)to remove redundant candidate boxes of convolutional neural network's outputs.Soft-NMS uses the method of gradually attenuating the score of candidate box to replace the method of directly deleting the candidate box larger than the predefined threshold in Hard-NMS,which can avoid deleting the overlapping object in the picture by mistake and improve the accuracy of the object detection task.However,the frequent change of candidate box score makes Soft-NMS more complex than Hard-NMS.In order to achieve high accurate,low-delay and low-power candidate box redundancy removals,this paper proposes a Soft-NMS based architecture,which uses logarithmic functions to optimize complex floating-point calculations and a two-level optimization structure with fine-grained flow and coarse-grained parallelism to improve the throughput of the algorithm.Experiments on Xilinx KU-115 FPGA show that our power consumption is 6.107 W,and the delay of processing 1000 boxes is 168.95μs.Compared with the Soft-NMS implemented by the CPU,the architecture achieves 36 times performance improvement and the performance power consumption ratio is 264 times that of CPU implementation.

关 键 词:可重构计算 目标检测 非极大值抑制 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象