机构地区:[1]上海交通大学电子信息与电气工程学院,上海201100 [2]华为技术有限公司华为云,杭州310051 [3]深圳市海思半导体有限公司,深圳518116
出 处:《中国图象图形学报》2021年第7期1604-1613,共10页Journal of Image and Graphics
基 金:科技部科技创新2030—“新一代人工智能”重大项目(2018AAA0100400);国家自然科学基金项目(61971277)。
摘 要:目的双目视觉是目标距离估计问题的一个很好的解决方案。现有的双目目标距离估计方法存在估计精度较低或数据准备较繁琐的问题,为此需要一个可以兼顾精度和数据准备便利性的双目目标距离估计算法。方法提出一个基于R-CNN(region convolutional neural network)结构的网络,该网络可以实现同时进行目标检测与目标距离估计。双目图像输入网络后,通过主干网络提取特征,通过双目候选框提取网络以同时得到左右图像中相同目标的包围框,将成对的目标框内的局部特征输入目标视差估计分支以估计目标的距离。为了同时得到左右图像中相同目标的包围框,使用双目候选框提取网络代替原有的候选框提取网络,并提出了双目包围框分支以同时进行双目包围框的回归;为了提升视差估计的精度,借鉴双目视差图估计网络的结构,提出了一个基于组相关和3维卷积的视差估计分支。结果在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)数据集上进行验证实验,与同类算法比较,本文算法平均相对误差值约为3.2%,远小于基于双目视差图估计算法(11.3%),与基于3维目标检测的算法接近(约为3.9%)。另外,提出的视差估计分支改进对精度有明显的提升效果,平均相对误差值从5.1%下降到3.2%。通过在另外采集并标注的行人监控数据集上进行类似实验,实验结果平均相对误差值约为4.6%,表明本文方法可以有效应用于监控场景。结论提出的双目目标距离估计网络结合了目标检测与双目视差估计的优势,具有较高的精度。该网络可以有效运用于车载相机及监控场景,并有希望运用于其他安装有双目相机的场景。Objective Object distance estimation is a fundamental problem in 3D vision.However,most successful object distance estimators need extra 3D information from active depth cameras or laser scanner,which increases the cost.Stereo vision is a convenient and cheap solution for this problem.Modern object distance estimation solutions are mainly based on deep neural network,which provides better accuracy than traditional methods.Deep learning-based solutions are of two main types.The first solution is combining a 2D object detector and a stereo image disparity estimator.The disparity esti-mator outputs depth information of the image,and the object detector detects object boxes or masks from the image.Then,the detected object boxes or masks are applied to the depth image to extract the pixel depth in the detected box,are then sorted,and the closest is selected to represent the distance of the object.However,such systems are not accurate enough to solve this problem according to the experiments.The second solution is to use a monocular 3D object detector.Such detectors can output 3D bounding boxes of objects,which indicate their distance.3D object detectors are more accurate,but need annotations of 3D bounding box coordinates for training,which require special devices to collect data and entail high labelling costs.Therefore,we need a solution that has good accuracy while keeping the simplicity of model training.Method We propose a region convolutional neural network(R-CNN)-based network to perform object detection and distance estimation from stereo images simultaneously.This network can be trained only using object distance labels,which is easy to apply to many fields such as surveillance scenes and robot motion.We utilize stereo region proposal network to extract proposals of the corresponding target bounding box from the left view and right view images in one step.Then,a stereo bounding-box regression module is used to regress corresponding bounding-box coordinates simultaneously.The disparity could be calculated from
关 键 词:双目视觉 目标距离估计 视差估计 深度神经网络 3维卷积 监控场景
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...