基于语义辅助和深度时序一致性约束的自监督单目深度估计

Self-supervised Monocular Depth Estimation Based on Semantic Assistance and Depth Temporal Consistency Constraints

作　　者：凌传武陈华徐大勇[3] 张小刚[1] LING Chuanwu;CHEN Hua;XU Dayong;ZHANG Xiaogang(College of Electrical and Information Engineering,Hunan University,Changsha 410082,China;College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China;Zhengzhou Tobacco Research Institute of CNTC,Zhengzhou 450000,China)

机构地区：[1]湖南大学电气与信息工程学院,湖南长沙410082 [2]湖南大学信息科学与工程学院,湖南长沙410082 [3]中国烟草总公司郑州烟草研究院,河南郑州450000

出　　处：《湖南大学学报（自然科学版）》2024年第8期1-12,共12页Journal of Hunan University:Natural Sciences

基　　金：国家自然科学基金资助项目(62171184,62273139,62106072);国家自然科学基金区域联合重点项目(U23A20385);国防预研项目(JCY2021206B015)。

摘　　要：通过使用相邻帧之间的光度一致性损失代替深度标签作为网络训练的监督信号,基于图像序列训练的自监督单目深度估计方法近年来受到了广泛的关注.光度一致性约束遵循了静态世界假设,而单目图像序列中存在的运动目标违反该假设,进而影响自监督训练过程中相机位姿估计精度和光度损失函数的计算精度.通过检测并移除运动目标区域,可在得到与目标运动解耦的相机位姿的同时,消除运动目标区域对光度损失计算精度的影响.为此,本文提出了一种基于语义辅助和深度时序一致性约束的自监督单目深度估计网络.首先,使用离线的实例分割网络检测可能违反静态世界假设的动态类别目标,并移除对应区域输入位姿网络从而得到与物体运动解耦的相机位姿.其次,基于语义一致性和光度一致性约束,检测动态类别目标的运动状态,使得运动区域的光度损失不影响网络参数的迭代更新.最后,在非运动区域施加深度时序一致性约束,显式对齐当前帧的估计深度值与相邻帧的投影深度值,进一步细化深度预测结果.在KITTI、DDAD以及KITTI Odometry数据集上的实验验证了所提方法与以往的自监督单目深度估计方法相比具有更出色的性能表现.Self-supervised monocular depth estimation methods trained on sequences of monocular images have received considerable attention in recent years by using the photometric consistency loss between adjacent frames instead of depth labels as the supervisory signal for network training.The photometric consistency constraint follows the static world assumption,but the moving objects in the monocular image sequence violate this assumption,which affects the camera pose estimation accuracy and the calculation accuracy of the photometric loss function during the self-supervised training process.By detecting and removing the moving target area,the camera pose decoupled from the target motion can be obtained,and the influence of the moving target area on the calculation accuracy of the photometric loss can be discarded.To this end,this paper proposes a self-supervised monocular depth estimation network based on semantic assistance and depth temporal consistency constraints.First,an offline instance segmentation network is used to detect dynamic category objects that may violate the static world assumption,and the corresponding region input pose network is removed to obtain a camera pose decoupled from object motion.Secondly,based on semantic consistency and photometric consistency constraints,the motion status of dynamic category targets is detected so that the photometric loss in the moving area does not affect the iterative update of network parameters.Finally,depth temporal consistency constraints are imposed in non-motion areas,and the estimated depth value of the current frame is explicitly aligned with the projected depth value of adjacent frames to further refine the depth prediction results.Experiments on the KITTI,DDAD and KITTI Odometry datasets verify that the proposed method has better performance than previous self-supervised monocular depth estimation methods.

关键词：单目深度估计自监督学习运动目标时序一致性

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语义辅助和深度时序一致性约束的自监督单目深度估计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语义辅助和深度时序一致性约束的自监督单目深度估计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索