基于深度学习的视觉目标检测技术综述  被引量:92

A survey on deep learning based visual object detection

在线阅读下载全文

作  者:曹家乐[1] 李亚利[2] 孙汉卿 谢今 黄凯奇[4] 庞彦伟[1] Cao Jiale;Li Yali;Sun Hanqing;Xie Jin;Huang Kaiqi;Pang Yanwei(Tianjin University,Tianjin 300072,China;Tsinghua University,Beijing 100084,China;Chongqing University,Chongqing 400044,China;Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)

机构地区:[1]天津大学,天津300072 [2]清华大学,北京100084 [3]重庆大学,重庆400044 [4]中国科学院自动化研究所,北京100190

出  处:《中国图象图形学报》2022年第6期1697-1722,共26页Journal of Image and Graphics

基  金:国家重点研发计划资助(2018AAA0102800);中国博士后科学基金项目(2021M700613);中国人工智能学会—华为MindSpore学术奖励基金。

摘  要:视觉目标检测旨在定位和识别图像中存在的物体,属于计算机视觉领域的经典任务之一,也是许多计算机视觉任务的前提与基础,在自动驾驶、视频监控等领域具有重要的应用价值,受到研究人员的广泛关注。随着深度学习技术的飞速发展,目标检测取得了巨大的进展。首先,本文总结了深度目标检测在训练和测试过程中的基本流程。训练阶段包括数据预处理、检测网络、标签分配与损失函数计算等过程,测试阶段使用经过训练的检测器生成检测结果并对检测结果进行后处理。然后,回顾基于单目相机的视觉目标检测方法,主要包括基于锚点框的方法、无锚点框的方法和端到端预测的方法等。同时,总结了目标检测中一些常见的子模块设计方法。在基于单目相机的视觉目标检测方法之后,介绍了基于双目相机的视觉目标检测方法。在此基础上,分别对比了单目目标检测和双目目标检测的国内外研究进展情况,并展望了视觉目标检测技术发展趋势。通过总结和分析,希望能够为相关研究人员进行视觉目标检测相关研究提供参考。Visual object detection aims to locate and recognize objects in images,which is one of the classical tasks in the field of computer vision and also the premise and foundation of many computer vision tasks.Visual object detection plays a very important role in the applications of automatic driving,video surveillance,which has attracted extensive attention of the researches in past few decades.In recent years,with the development of the technique of deep learning,visual object detection has also made great progress.This paper focuses on a deep survey on deep learning based visual object detection,including monocular object detection and stereo object detection.First,we summarize the pipeline of deep object detection during the training and inference.The training process is composed of data pre-processing,detection network design,and label assignment and loss function in common.Data pre-processing(e.g.,multi-scale training and flip)aims to enhance the diversity of the given training data,which can improve detection performance of object detector.Detection network usually consists of three key parts like the backbone(e.g.,Visual Geometry Group(VGG)and ResNet),feature fusion module(e.g.,feature pyramid network(FPN)),and prediction network(e.g.,region of interest head network(RoI head)).Label assignment aims to assign the true value for each prediction,and loss function can supervise the network training.During inference,we adopt the trained detector to generate the detection bounding-boxes and employ the post-processing step(e.g.,non-maximum suppression(NMS))to combine the bounding-boxes.Second,we illustrate a deep review on monocular object detection,including anchor-based,anchor-free,and end-to-end methods,respectively.Anchor-based methods design some default anchors and perform classification and regression based on these default anchors,which can be further split into two-stage and one-stage methods.Two-stage methods first generate some candidate proposals based on the default anchors,and second classify/regress t

关 键 词:视觉目标检测 深度学习 单目 双目 锚点框 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象