基于多尺度Transformer的多视图三维形状分析方法

Multi-scale Transformer for View-based 3D Shape Analysis

作　　者：卫鑫孙剑 WEI Xin;SUN Jian(School of Mathematics and Statistics,Xi’an Jiaotong University,Xi’an 710049)

出　　处：《工程数学学报》2024年第1期164-174,共11页Chinese Journal of Engineering Mathematics

基　　金：国家自然科学基金(12125104).

摘　　要：基于多视图的三维形状分析方法是三维计算机视觉领域中的重要研究分支,通过整合三维形状在多个视角下的二维图像的特征来完成三维形状的识别、检索等任务。然而,如何有效地探索不同视角之间的关联性,并运用这些关联性来聚合多视图图像的特征仍然是三维形状分析中一个亟待解决的核心问题。受到最近兴起的Transformer网络在关系建模问题上成功应用的启发,研究工作引入了一种创新的多尺度Transformer架构,提出了基于多尺度Transformer的多视图三维形状分析方法(Multi-View Multi-Scale Transformer,MVMST)。此方法能够有效地学习不同视角之间的关联性,将多视图图像的特征聚合为一个具有强大表达能力的整体描述符。与以往方法使用感受野为全局的Transformer建模多视图特征的关系不同,该方法受到多尺度学习方法的启发,使用多尺度的Transformer来建模不同尺度下的多视图图像特征之间的关系,并设计了一个多尺度融合模块将多个尺度下经过Transformer处理的特征进行融合,得到一个相比单一尺度更加有效的多尺度表示。多个视图的多尺度表示最终经过视角池化模块融合成三维形状的一个整体描述符。研究了在多个合成和真实扫描三维形状分类数据集上进行了实验,结果表明所提出的方法在三维形状分类任务上表现出令人满意的性能。View-based 3D shape analysis is a crucial research domain within the field of 3D computer vision.Those techniques aim to recognise and retrieve 3D objects by aggregating features extracted from 2D images of the same object taken from different viewpoints.Howev-er,effectively exploring the relationships between different viewpoints and aggregating features from multiple viewpoints using these relationships remain fundamental challenges in the field of 3D shape analysis.Taking inspiration from the recent success of Transformer networks in modeling relationships,an novel multi-scale Transformer architecture is introduced and the Multi-View Multi-Scale Transformer(MVMST)is presented for three-dimensional shape anal-ysis.MVMST efficiently learns relationships between different views and integrates features from multi-view images into a global descriptor.While previous approaches use a Transformer with a global receptive field to model the relationships between multi-view features,MVMST makes use of multi-scale learning.A multi-scale Transformer is used to model the relation-ships between multi-view features at different scales.In addition,a multi-scale fusion module is designed to merge the features processed by the multi-scale Transformer to obtain a more efficient multi-scale representation.With the view pooling module,these multi-scale represen-tations from different views are eventually fused into a global descriptor of the 3D shape.The experiments on synthetic and real-world 3D object classification datasets demonstrate that the proposed method shows promising performance in 3D object classification tasks.

关键词：三维形状分析 TRANSFORMER 多尺度方法

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多尺度Transformer的多视图三维形状分析方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多尺度Transformer的多视图三维形状分析方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索