检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:童伟 张苗苗[2] 李东方 吴奇[2] 宋爱国[4] TONG Wei;ZHANG Miaomiao;LI Dongfang;WU Qi;SONG Aiguo(School of Mechanical Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;School of Electronic,Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China;School of Electrical Engineering and Automation,Fuzhou University,Fuzhou 350108,China;School of Instrument Science and Engineering,Southeast University,Nanjing 210096,China)
机构地区:[1]南京理工大学机械工程学院,南京210094 [2]上海交通大学电子信息与电气工程学院,上海200240 [3]福州大学电气工程与自动化学院,福州350108 [4]东南大学仪器科学与工程学院,南京210096
出 处:《电子与信息学报》2023年第10期3483-3491,共9页Journal of Electronics & Information Technology
基 金:国家自然科学基金(U1933125,62171274);国家自然科学基金“叶企孙”重点项目(U2241228);国防创新特区项目(193-CXCY-A04-01-11-03,223-CXCY-A04-05-09-01);上海市级科技重大专项(2021SHZDZX)。
摘 要:基于深度学习的多视角立体几何(MVS)旨在通过多个视图重建出稠密的3维场景。然而现有的方法通常设计复杂的2D网络模块来学习代价体聚合的跨视角可见性,忽略了跨视角2维上下文特征在3D深度方向的一致性假设。此外,基于多阶段的深度推断方法仍需要较高的深度采样率,并且在静态或预先设定的范围内采样深度值,容易在物体边界以及光照遮挡等区域产生错误的深度推断。为了缓解这些问题,该文提出一种基于边缘辅助极线Transformer的密集深度推断模型。与现有工作相比,具体改进如下:将深度回归转换为多深度值分类进行求解,在有限的深度采样率和GPU占用下保证了推断精度;设计一种极线Transformer模块提高跨视角代价体聚合的可靠性,并引入边缘检测分支约束边缘特征在极线方向的一致性;为了提高弱纹理区域的精度,设计了基于概率成本体积的动态深度范围采样机制。与主流的方法在公开的数据集上进行了综合对比,实验结果表明所提模型能够在有限的显存占用下重建出稠密准确的3D场景。特别地,相比于Cas-MVSNet,所提模型的显存占用降低了35%,深度采样率降低约50%,DTU数据集的综合误差从0.355降低至0.325。Learning-based Multiple-View Stereo(MVS)aims to reconstruct dense 3D scene representation.However,previous methods utilize additional 2D network modules to learn the cross view visibility for cost aggregation,ignoring the consistency assumption of 2D contextual features in the 3D depth direction.In addition,the current multi-stage depth inference model still requires a high depth sampling rate,and depth hypothesis is sampled within static and preset depth range,which is prone to generate errorneous depth inference in the object boundary and occluded area.To alleviate these problems,a multi-view stereo network based on edge assisted epipolar Transformer is proposed.The improvements of this work over the state of the art are as:Depth regression is replaced by the multi-depth hypotheses classification to ensure the accuracy with limited depth sampling rate and GPU consumption.Epipolar Transformer block is developed for reliable cross view cost aggregation,and edge detection branch is designed to constrain the consistency of edge features in the epipolar direction.A dynamic depth range sampling mechanism based on probabilistic cost volume is applied to improve the accuracy of uncertain areas.Comprehensive comparisons with the state of the art are conducted on public benchmarks,which indicate that the proposed method can reconstruct dense scene representations with limited memory bottleblock.Specifically,compared with Cas-MVSNet,the memory consumption is reducted by 35%,the depth sampling rate is reduced by about 50%,and the overall error on DTU datasets is reduced from 0.355 to 0.325.
关 键 词:多视角场景重建 多视角立体几何 深度估计 极线几何 TRANSFORMER
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.16.215.60