Swin3D: A pretrained transformer backbone for 3D indoor scene understanding  

在线阅读下载全文

作  者:Yu-Qi Yang Yu-Xiao Guo Jian-Yu Xiong Yang Liu Hao Pan Peng-Shuai Wang Xin Tong Baining Guo 

机构地区:[1]Institute for Advanced Study,Tsinghua University,Beijing 100084,China [2]Tsinghua Shenzhen International Graduate School,Tsinghua University,Shenzhen 518055,China [3]Internet Graphics Group,Microsoft Research Asia,Beijing 100080,China [4]Wangxuan Institute of Computer Technology,Peking University,Beijing 100080,China

出  处:《Computational Visual Media》2025年第1期83-101,共19页计算可视媒体(英文版)

摘  要:The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,called Swin3D,for 3D indoor scene understanding.We designed a 3D Swin Transformer as our backbone network,which enables efficient selfattention on sparse voxels with linear memory complexity,making the backbone scalable to large models and datasets.We also introduce a generalized contextual relative positional embedding scheme to capture various irregularities of point signals for improved network performance.We pretrained a large Swin3D model on a synthetic Structured3D dataset,which is an order of magnitude larger than the ScanNet dataset.Our model pretrained on the synthetic dataset not only generalizes well to downstream segmentation and detection on real 3D point datasets but also outperforms state-of-the-art methods on downstream tasks with+2.3 mIoU and+2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation,respectively,+1.8 mIoU on ScanNet segmentation(val),+1.9 mAP@0.5 on ScanNet detection,and+8.1 mAP@0.5 on S3DIS detection.A series of extensive ablation studies further validated the scalability,generality,and superior performance enabled by our approach.

关 键 词:3D pretraining ponitcloud analysis trans-former backbone Swin Transformer 3D semantic segmentation 3D object detection 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象