检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:涂晴昊 李元琪 刘一凡[1] 过洁[1] 郭延文[1] TU Qinghao;LI Yuanqi;LIU Yifan;GUO Jie;GUO Yanwen(School of Computer Science,Nanjing University,Nanjing Jiangsu 210033,China)
出 处:《图学学报》2025年第1期139-149,共11页Journal of Graphics
摘 要:针对现有的材质贴图数据集存在着文字描述不足且纯图像数据集规模庞大的现状,及传统的生成模型推理错误时难以获得额外的超参数来生成新的结果等问题,提出一种基于稳定扩散模型的文本生成材质贴图的泛化性优化方法,采用分阶段的方式训练模型:使用大规模纯图像数据集对扩散模型进行微调,以拟合图像的生成;使用小规模含文本标注的数据集学习语义信息;引入新的解码器对扩散模型生成的隐编码重建得到材质贴图;最终可以通过输入文本描述获得多组随机生成的且符合描述的材质贴图结果。该方法使用Colossal架构组织代码,大大降低了训练的硬件要求;将图像拟合数据集、语义信息学习的工作分开,使用大规模图像数据集拟合模型参数,使用小规模文本数据学习语义信息,提高了模型的泛化性,减少了对多模态数据集规模的需求。Considering the current situation where existing material texture datasets lack sufficient textual descriptions,while pure image datasets are massive in scale,and the difficulty of obtaining additional hyperparameters to generate new results when traditional generative models encounter inference errors,a generalized optimization method for text to material texture maps based on a stable diffusion model was proposed.The model was trained in a staged manner:firstly,a large-scale pure image dataset was used to finetune the diffusion model to fit image generation.Secondly,a small-scale dataset with text annotations was employed to learn semantic information.Thirdly,a new decoder was introduced to reconstruct texture maps from the latent codes generated by the diffusion model;ultimately,multiple randomly generated texture maps that conformed to the given descriptions were obtained by inputting textual descriptions.The method employed the Colossal architecture to organize the code,significantly reducing hardware requirements for training.By separating the tasks of image fitting and semantic information learning,with large-scale image datasets used for model parameter fitting and small-scale text data used for learning semantic information,the method enhanced the generalization of the model and reduced the demand for multimodal dataset scale.
关 键 词:扩散模型 泛化性 多模态 文本驱动材质贴图生成 材质编辑器
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.241.210