具有跨尺度Transformer的高效多视图立体网络

Efficient Multi-View Stereo Network with Cross-Scale Transformer

作　　者：王思成江浩[1,2] 陈晓 WANG Sicheng;JIANG Hao;CHEN Xiao(School of Artificial Intelligence(School of Future Technology),Nanjing University of Information Science and Technology,Nanjing 210044,Jiangsu,China;National Mobile Communications Research Laboratory,Southeast University,Nanjing 210096,Jiangsu,China)

机构地区：[1]南京信息工程大学人工智能学院(未来科技学院),江苏南京210044 [2]东南大学移动通信国家重点实验室,江苏南京210096

出　　处：《计算机工程》2024年第11期266-275,共10页Computer Engineering

基　　金：国家自然科学基金(62101273);东南大学移动通信国家重点实验室开放研究基金资助(2022D10)。

摘　　要：现有深度多视图立体(MVS)方法将Transformer引入级联网络,以实现高分辨率深度估计,从而实现高精确度和完整度的三维重建结果。然而,基于Transformer的方法受计算成本的限制,无法扩展到更精细的阶段。为此,提出一种新颖的跨尺度Transformer的MVS网络,在不增加额外计算的情况下处理不同阶段的特征表示。引入一种自适应匹配感知Transformer(AMT),在多个尺度上使用不同的交互式注意力组合。这种组合策略使所提网络能够捕捉图像内部的上下文信息,并增强图像之间的特征关系。此外,设计双特征引导聚合(DFGA),将粗糙的全局语义信息嵌入到更精细的代价体构建中,以进一步增强全局和局部特征的感知。同时,通过设计一种特征度量损失,用于评估变换前后的特征偏差,以减少特征错误匹配对深度估计的影响。实验结果表明,在DTU数据集中,所提网络的完整度和整体度量达到0.264、0.302,在Tanks and temples 2个大场景的重建平均值分别达到64.28、38.03。At present,deep Multi-View Stereo(MVS)methods widely introduce Transformers into cascade networks to achieve high-resolution depth estimation,thereby ensuring highly accurate and complete 3D reconstruction results.However,Transformer-based methods are limited by their computational costs and cannot be extended to more refined stages.To solve this problem,this paper proposes a novel cross-scale Transformer-based MVS network that can manage feature representations at different stages without incurring additional computation.In particular,this study introduces an Adaptive Matching-aware Transformer(AMT),which uses different interactive attention combinations on multiple scales,enabling the proposed network to capture contextual information within images and enhance the feature relationships between images.In addition,this study proposes Dual Feature Guided Aggregation(DFGA)to embed coarse global semantic information into finer cost body construction,further enhancing the perception of global and local features.Simultaneously,a feature metric loss is designed to evaluate feature deviation before and after the Transformation and thereby reduce the impact of feature mismatch on depth estimation.Experimental results show that the integrity and overall measurements of the proposed network are 0.264 and 0.302 on the DTU dataset,respectively.The average reconstruction values for Tank and temples scenarios are 64.28 and 38.03,respectively.

关键词：多视图立体特征匹配 Transformer网络注意力机制三维重建

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

具有跨尺度Transformer的高效多视图立体网络

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

具有跨尺度Transformer的高效多视图立体网络

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索