基于扩散模型的文本生成材质贴图的泛化性优化方法  

Generalization optimization method for text to material texture maps based on diffusion model

作  者:涂晴昊 李元琪 刘一凡[1] 过洁[1] 郭延文[1] TU Qinghao;LI Yuanqi;LIU Yifan;GUO Jie;GUO Yanwen(School of Computer Science,Nanjing University,Nanjing Jiangsu 210033,China)

机构地区:[1]南京大学计算机学院,江苏南京210033

出  处:《图学学报》2025年第1期139-149,共11页Journal of Graphics

摘  要:针对现有的材质贴图数据集存在着文字描述不足且纯图像数据集规模庞大的现状,及传统的生成模型推理错误时难以获得额外的超参数来生成新的结果等问题,提出一种基于稳定扩散模型的文本生成材质贴图的泛化性优化方法,采用分阶段的方式训练模型:使用大规模纯图像数据集对扩散模型进行微调,以拟合图像的生成;使用小规模含文本标注的数据集学习语义信息;引入新的解码器对扩散模型生成的隐编码重建得到材质贴图;最终可以通过输入文本描述获得多组随机生成的且符合描述的材质贴图结果。该方法使用Colossal架构组织代码,大大降低了训练的硬件要求;将图像拟合数据集、语义信息学习的工作分开,使用大规模图像数据集拟合模型参数,使用小规模文本数据学习语义信息,提高了模型的泛化性,减少了对多模态数据集规模的需求。Considering the current situation where existing material texture datasets lack sufficient textual descriptions,while pure image datasets are massive in scale,and the difficulty of obtaining additional hyperparameters to generate new results when traditional generative models encounter inference errors,a generalized optimization method for text to material texture maps based on a stable diffusion model was proposed.The model was trained in a staged manner:firstly,a large-scale pure image dataset was used to finetune the diffusion model to fit image generation.Secondly,a small-scale dataset with text annotations was employed to learn semantic information.Thirdly,a new decoder was introduced to reconstruct texture maps from the latent codes generated by the diffusion model;ultimately,multiple randomly generated texture maps that conformed to the given descriptions were obtained by inputting textual descriptions.The method employed the Colossal architecture to organize the code,significantly reducing hardware requirements for training.By separating the tasks of image fitting and semantic information learning,with large-scale image datasets used for model parameter fitting and small-scale text data used for learning semantic information,the method enhanced the generalization of the model and reduced the demand for multimodal dataset scale.

关 键 词:扩散模型 泛化性 多模态 文本驱动材质贴图生成 材质编辑器 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象