检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李景琳 姜晶菲[1] 窦勇[1] 许金伟 温冬 LI Jing-lin;JIANG Jing-fei;DOU Yong;XU Jin-wei;WEN Dong(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
机构地区:[1]国防科技大学计算机学院,湖南长沙410073
出 处:《计算机工程与科学》2021年第4期586-593,共8页Computer Engineering & Science
基 金:国家核高基重大专项(2018ZX01028101)。
摘 要:目标检测任务通常使用非极大值抑制算法(NMS)删除卷积神经网络输出的冗余候选框。Soft-NMS使用逐步衰减候选框得分值的方法代替Hard-NMS中直接删除大于预定义阈值候选框的方法,可以避免误删图像中重叠的目标候选框,提高目标检测任务的准确率。但是,频繁地改变候选框得分值使得Soft-NMS较Hard-NMS更为复杂,为了实现高准确率、低延时、低功耗的候选框去冗余效果,提出一种基于Soft-NMS的体系结构,利用对数函数优化复杂的浮点计算,细粒度流水和粗粒度并行组成2级优化结构进一步提升算法的吞吐率。在XILINX KU-115 FPGA开发板上对该体系结构进行了评估,评估结果表明,该体系结构的功耗为6.107 W,处理992个候选框的延时为168.95μs,与CPU实现的Soft-NMS相比,该体系结构实现了36倍的性能提升,性能功耗比为CPU实现的264倍。Object detection tasks usually use the non-maximum suppression algorithm(NMS)to remove redundant candidate boxes of convolutional neural network's outputs.Soft-NMS uses the method of gradually attenuating the score of candidate box to replace the method of directly deleting the candidate box larger than the predefined threshold in Hard-NMS,which can avoid deleting the overlapping object in the picture by mistake and improve the accuracy of the object detection task.However,the frequent change of candidate box score makes Soft-NMS more complex than Hard-NMS.In order to achieve high accurate,low-delay and low-power candidate box redundancy removals,this paper proposes a Soft-NMS based architecture,which uses logarithmic functions to optimize complex floating-point calculations and a two-level optimization structure with fine-grained flow and coarse-grained parallelism to improve the throughput of the algorithm.Experiments on Xilinx KU-115 FPGA show that our power consumption is 6.107 W,and the delay of processing 1000 boxes is 168.95μs.Compared with the Soft-NMS implemented by the CPU,the architecture achieves 36 times performance improvement and the performance power consumption ratio is 264 times that of CPU implementation.
分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.4