检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王思成 江浩[1,2] 陈晓 WANG Sicheng;JIANG Hao;CHEN Xiao(School of Artificial Intelligence(School of Future Technology),Nanjing University of Information Science and Technology,Nanjing 210044,Jiangsu,China;National Mobile Communications Research Laboratory,Southeast University,Nanjing 210096,Jiangsu,China)
机构地区:[1]南京信息工程大学人工智能学院(未来科技学院),江苏南京210044 [2]东南大学移动通信国家重点实验室,江苏南京210096
出 处:《计算机工程》2024年第11期266-275,共10页Computer Engineering
基 金:国家自然科学基金(62101273);东南大学移动通信国家重点实验室开放研究基金资助(2022D10)。
摘 要:现有深度多视图立体(MVS)方法将Transformer引入级联网络,以实现高分辨率深度估计,从而实现高精确度和完整度的三维重建结果。然而,基于Transformer的方法受计算成本的限制,无法扩展到更精细的阶段。为此,提出一种新颖的跨尺度Transformer的MVS网络,在不增加额外计算的情况下处理不同阶段的特征表示。引入一种自适应匹配感知Transformer(AMT),在多个尺度上使用不同的交互式注意力组合。这种组合策略使所提网络能够捕捉图像内部的上下文信息,并增强图像之间的特征关系。此外,设计双特征引导聚合(DFGA),将粗糙的全局语义信息嵌入到更精细的代价体构建中,以进一步增强全局和局部特征的感知。同时,通过设计一种特征度量损失,用于评估变换前后的特征偏差,以减少特征错误匹配对深度估计的影响。实验结果表明,在DTU数据集中,所提网络的完整度和整体度量达到0.264、0.302,在Tanks and temples 2个大场景的重建平均值分别达到64.28、38.03。At present,deep Multi-View Stereo(MVS)methods widely introduce Transformers into cascade networks to achieve high-resolution depth estimation,thereby ensuring highly accurate and complete 3D reconstruction results.However,Transformer-based methods are limited by their computational costs and cannot be extended to more refined stages.To solve this problem,this paper proposes a novel cross-scale Transformer-based MVS network that can manage feature representations at different stages without incurring additional computation.In particular,this study introduces an Adaptive Matching-aware Transformer(AMT),which uses different interactive attention combinations on multiple scales,enabling the proposed network to capture contextual information within images and enhance the feature relationships between images.In addition,this study proposes Dual Feature Guided Aggregation(DFGA)to embed coarse global semantic information into finer cost body construction,further enhancing the perception of global and local features.Simultaneously,a feature metric loss is designed to evaluate feature deviation before and after the Transformation and thereby reduce the impact of feature mismatch on depth estimation.Experimental results show that the integrity and overall measurements of the proposed network are 0.264 and 0.302 on the DTU dataset,respectively.The average reconstruction values for Tank and temples scenarios are 64.28 and 38.03,respectively.
关 键 词:多视图立体 特征匹配 Transformer网络 注意力机制 三维重建
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.57