基于2D特征蒸馏的3D高斯泼溅语义分割与编辑  

3D Gaussian splatting semantic segmentation and editing based on 2D feature distillation

在线阅读下载全文

作  者:刘高屹 胡瑞珍 刘利刚[1] LIU Gaoyi;HU Ruizhen;LIU Ligang(School of Mathematical Sciences,University of Science and Technology of China,Hefei Anhui 230026,China;College of Computer Science&Software Engineering,Shenzhen University,Shenzhen Guangdong 518060,China)

机构地区:[1]中国科学技术大学数学科学学院,安徽合肥230026 [2]深圳大学计算机与软件学院,广东深圳518060

出  处:《图学学报》2025年第2期312-321,共10页Journal of Graphics

基  金:国家自然科学基金(62025207)。

摘  要:三维场景的语义理解是人类感知世界的基本方式之一。一些语义任务,如开放词汇分割和语义编辑,是计算机视觉和计算机图形学的重要研究领域。由于缺乏大型、多样化的三维开放词汇分割数据集,直接训练一个稳健、可泛化的模型并非易事。为此,提出了基于2D特征蒸馏的3D高斯泼溅,这是一种将SAM和CLIP大模型的语义嵌入蒸馏到3D高斯的方法。对于每个场景,通过SAM和CLIP获取逐像素语义特征,然后使用3D高斯可微分渲染进行训练,以获得特定场景的语义特征场。在语义分割任务中,为获得场景中每个对象的精确分割边界,设计了一种多步骤的分割掩码选择策略,无需繁琐的分层特征提取和训练过程,即可得到新视角图像精确的开放词汇语义分割。利用显式的3D高斯场景表示,有效实现了文本与三维对象间的对应,从而进行语义编辑。实验表明,该方法与所比较方法相比,在语义分割任务中获得相当或更好的定性和定量结果,同时通过三维高斯语义特征场实现了开放词汇语义编辑。Semantic understanding of 3D scenes constitutes one of the fundamental ways humans perceive the world.Some semantic tasks,such as open vocabulary segmentation,and semantic editing,are essential research domains in computer vision and computer graphics.However,the absence of large and diverse segmentation datasets of 3D open vocabulary makes it challenging to directly train a robust and generalizable model.To address this issue,3D Gaussian splatting based on 2D feature distillation was proposed,which distills semantic embeddings from the SAM and CLIP macromodels into 3D Gaussians.For each scene,pixel-wise semantic features were obtained via SAM and CLIP,and training was conducted using 3D Gaussian differentiable rendering to generate a scene-specific semantic feature field.In the semantic segmentation task,in order to obtain the accurate segmentation boundary of each object in the scene,a multi-step segmentation mask selection strategy was designed to obtain the accurate open vocabulary semantic segmentation for the new perspective images without requiring the tedious hierarchical feature extraction and training processes.Through explicit 3D Gaussian scene representations,the correspondence between text and 3D objects was effectively established,enabling semantic editing.Experiments demonstrated that the method achieved comparable or superior qualitative and quantitative results in semantic segmentation tasks compared to existing methods,while enabling open vocabulary semantic editing through a 3D Gaussian semantic feature field.

关 键 词:三维场景 3D高斯泼溅 语义分割 特征场 开放词汇的语义编辑 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象