检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐光锴 赵峰 Guangkai Xu;Feng Zhao(National Engineering Laboratory for Brain-inspired Intelligence Technology and Application,School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China)
机构地区:[1]中国科学技术大学信息科学技术学院类脑智能技术及应用国家工程实验室,安徽合肥230027
出 处:《中国科学技术大学学报》2024年第4期13-22,12,66,共12页JUSTC
基 金:supported by the Anhui Provincial Natural Science Foundation (2108085UD12)。
摘 要:单目深度估计方法在各种场景下已经取得了较强的鲁棒性,该类方法通常预测尺度偏移量未知的不变仿射深度而非度量深度,因为收集大规模的不变仿射深度训练数据比收集度量深度训练数据容易得多。然而,在某些基于视频的应用场景中,例如视频深度估计和三维场景重建,每帧预测的深度中存在的未知比例和偏移量值可能会导致预测的深度不一致。为了解决该问题,我们提出了一种基于局部加权线性回归的方法,通过利用稀疏锚点恢复深度的尺度图和偏移量图,以保证连续帧之间的一致性。大量的实验表明,我们的方法可以在几个零样本基准上显著降低现有技术方法的Rel误差(相对误差)。此外,我们收集了630万张RGBD图像对来训练鲁棒的深度模型。通过局部恢复尺度和偏移量,我们的ResNet50-backbone模型性能甚至超过了最先进的DPT ViT-Large模型。与基于几何的重建方法相结合,我们提出了一种新的稠密三维场景重建流程,既能受益于稀疏点的尺度一致性,又能受益于单目深度估计方法的鲁棒性。通过对视频的每一帧依次预测深度图,我们可以重建出准确的三维场景几何信息。Monocular depth estimation methods have achieved excellent robustness on diverse scenes,usually by predicting affine-invariant depth,up to an unknown scale and shift,rather than metric depth in that it is much easier to collect large-scale affine-invariant depth training data.However,in some video-based scenarios such as video depth estimation and 3D scene reconstruction,the unknown scale and shift residing in per-frame prediction may cause the predicted depth to be inconsistent.To tackle this problem,we propose a locally weighted linear regression method to recover the scale and shift map with very sparse anchor points,which ensures the consistency along consecutive frames.Extensive experiments show that our method can drop the Rel error(relative error)of existing state-of-the-art approaches significantly over several zero-shot benchmarks.Besides,we merge 6.3 million RGBD images to train robust depth models.By locally recovering scale and shift,our produced ResNet50-backbone model even outperforms the state-of-the-art DPT ViT-Large model.Combined with geometry-based reconstruction methods,we formulate a new dense 3D scene reconstruction pipeline,which benefits from both the scale consistency of sparse points and the robustness of monocular methods.By performing simple per-frame prediction over a video,the accurate 3D scene geometry can be recovered.
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.13