基于局部对齐单目视频深度的三维场景重建被引量：1

Toward 3D scene reconstruction from locally scale-aligned monocular video depth

作　　者：徐光锴赵峰 Guangkai Xu;Feng Zhao(National Engineering Laboratory for Brain-inspired Intelligence Technology and Application,School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China)

机构地区：[1]中国科学技术大学信息科学技术学院类脑智能技术及应用国家工程实验室,安徽合肥230027

出　　处：《中国科学技术大学学报》2024年第4期13-22,12,66,共12页JUSTC

基　　金：supported by the Anhui Provincial Natural Science Foundation (2108085UD12)。

摘　　要：单目深度估计方法在各种场景下已经取得了较强的鲁棒性,该类方法通常预测尺度偏移量未知的不变仿射深度而非度量深度,因为收集大规模的不变仿射深度训练数据比收集度量深度训练数据容易得多。然而,在某些基于视频的应用场景中,例如视频深度估计和三维场景重建,每帧预测的深度中存在的未知比例和偏移量值可能会导致预测的深度不一致。为了解决该问题,我们提出了一种基于局部加权线性回归的方法,通过利用稀疏锚点恢复深度的尺度图和偏移量图,以保证连续帧之间的一致性。大量的实验表明,我们的方法可以在几个零样本基准上显著降低现有技术方法的Rel误差(相对误差)。此外,我们收集了630万张RGBD图像对来训练鲁棒的深度模型。通过局部恢复尺度和偏移量,我们的ResNet50-backbone模型性能甚至超过了最先进的DPT ViT-Large模型。与基于几何的重建方法相结合,我们提出了一种新的稠密三维场景重建流程,既能受益于稀疏点的尺度一致性,又能受益于单目深度估计方法的鲁棒性。通过对视频的每一帧依次预测深度图,我们可以重建出准确的三维场景几何信息。Monocular depth estimation methods have achieved excellent robustness on diverse scenes,usually by predicting affine-invariant depth,up to an unknown scale and shift,rather than metric depth in that it is much easier to collect large-scale affine-invariant depth training data.However,in some video-based scenarios such as video depth estimation and 3D scene reconstruction,the unknown scale and shift residing in per-frame prediction may cause the predicted depth to be inconsistent.To tackle this problem,we propose a locally weighted linear regression method to recover the scale and shift map with very sparse anchor points,which ensures the consistency along consecutive frames.Extensive experiments show that our method can drop the Rel error(relative error)of existing state-of-the-art approaches significantly over several zero-shot benchmarks.Besides,we merge 6.3 million RGBD images to train robust depth models.By locally recovering scale and shift,our produced ResNet50-backbone model even outperforms the state-of-the-art DPT ViT-Large model.Combined with geometry-based reconstruction methods,we formulate a new dense 3D scene reconstruction pipeline,which benefits from both the scale consistency of sparse points and the robustness of monocular methods.By performing simple per-frame prediction over a video,the accurate 3D scene geometry can be recovered.

关键词：三维场景重建单目深度估计局部加权线性回归

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于局部对齐单目视频深度的三维场景重建被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于局部对齐单目视频深度的三维场景重建 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于局部对齐单目视频深度的三维场景重建被引量：1