文本语义引导的自动动态场景新视角渲染方法  

Automatic novel view synthesis for dynamic scenes guided by text semantics

在线阅读下载全文

作  者:林玉萍[1] 李胜鹏 田丰瑞 LIN Yuping;LI Shengpeng;TIAN Fengrui(School of Foreign Studies,Xi’an Jiaotong University,Xi’an710049,China;College of Artificial Intelligence,Xi’an Jiaotong University,Xi’an710049,China)

机构地区:[1]西安交通大学外国语学院,陕西西安710049 [2]西安交通大学人工智能学院,陕西西安710049

出  处:《华中科技大学学报(自然科学版)》2025年第3期8-13,共6页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基  金:国家重点研发计划资助项目(2020AAA0108102);陕西省社会科学基金资助项目(2021K014)。

摘  要:提出一种基于文本先验引导的动态场景新视角渲染方法,以动态前景内容的文本信息作为语义先验引导分割模型自动生成高质量的前背景掩码,进而在无须人工标注情况下实现动态场景的新视角渲染.具体而言,模型首先利用Grounding DINO实现文本提示到边界框提示的转换,然后用基于原图和边界框提示的分割一切模型(SAM)实现动态前景掩码的自动生成,最后构建基于动态前景掩码的动态神经辐射场实现动态场景下新视角的自动渲染.在Nvidia Dynamic Scene数据集上验证了本文方法的有效性.在主观对比实验中,本文方法在新视角下相较其他方法而言利用语义引导的先验知识成功渲染出了更为清晰的动态前景与静态背景.在客观对比实验中,本文方法在峰值信噪比(PSNR)、结构相似性(SSIM)、学习感知图像块相似度(LPIPS)三种衡量图像生成质量的指标上均优于其他最先进的方法.A novel view synthesis method for dynamic scenes guided by text priors was proposed.The text information of dynamic foreground content was used as a semantic prior to guide the segmentation model to automatically generate high-quality foreground and background masks.Consequently,novel view synthesis for dynamic scenes can be achieved without manual annotation.Specifically,the Grounding DINO was first employed to convert text prompts into bounding box prompts,and the segment anything model(SAM) was used,based on the original image and bounding box prompts,to automatically generate dynamic foreground masks.Finally,a dynamic neural radiance field was constructed based on these dynamic foreground masks to achieve automatic novel view synthesis for dynamic scenes.The effectiveness of this method was validated on the NVIDIA dynamic scene dataset.In subjective comparative experiments,compared to other methods,our method successfully renders clearer dynamic foreground and static background using semantic guided prior knowledge from a new perspective.In objective comparison experiments,this method outperforms other state-of-the-art methods in terms of peak signal to noise ratio(PSNR),structural similarity(SSIM),and learned perceptual image patch similarity(LPIPS),which are metrics for evaluating image generation quality.

关 键 词:新视角渲染 动态场景 文本引导 分割一切模型 掩码自动生成 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象