基于稳定扩散模型的汉服效果图生成研究

Research on Generating Hanfu Effect Drawing Based on a Stable Diffusion Model

作　　者：李智[1] 陈郁 LI Zhi;CHEN Yu(School of Textiles and Fashion,Shanghai University of Engineering Science,Shanghai 201620,China)

机构地区：[1]上海工程技术大学纺织服装学院,上海201620

出　　处：《北京服装学院学报（自然科学版）》2024年第4期90-97,共8页Journal of Beijing Institute of Fashion Technology：Natural Science Edition

基　　金：上海市教育委员会东方学者项目(TP2017074)。

摘　　要：针对汉服效果图生成过程中因各朝代服饰特征难以被准确捕捉而造成生成图像朝代混淆的问题,本文基于稳定扩散模型(Stable Diffusion),根据新输入的文本提示词匹配文本与图像特征空间向量,将V^(*)作为新标记符号嵌入层,并协同交叉注意力层参数W k和W v进行联合优化,最终搜索模型再学习新服饰文本特征后的损失函数最小值。通过查阅文献史料,收集整理并新增了唐、宋、明3个朝代163个服饰文本提示词。观察生成的汉服效果图,该模型能根据文本提示词生成符合朝代特征的服饰图像,较未融合汉服模型特征的3种常用文本生成图像算法,其生成的图像更为清晰且高质。在消融实验中,该模型采用特定ID优化标记符号V^(*),与其他方式相比,具有较高的图像对齐度和较低的文本对齐度。在唐、宋、明3个朝代的实验中,KID值和MMD值的均值都相对较低,表明本模型在优化汉服效果图生成方面具有一定的可行性和有效性。Aiming at the problem of confusion of dynasties in the image generation of Hanfu renderings due to the difficulty in accurately capturing the costume features of each dynasty,based on the Stable Diffusion model,the text and image feature space vectors are matched according to the newly input text prompt words,V^(*)is used as the new marker symbol embedding layer,and the cross-attention layer parameters W k and W v are jointly optimized,ultimately minimizing the loss function of the model after learning new clothing text features.Through consulting the literature and historical materials,163 text prompts related to clothing from the Tang,Song,and Ming dynasties were collected and organized.The generated Hanfu effect images demonstrate that the model can create garment images that correspond to the specific characteristics of each dynasty based on the text prompts words.Compared to three commonly used text-to-image generation algorithms that do not integrate Hanfu model features,the images generated by this method are clearer and of higher quality.In ablation experiments,the model employs the specific ID optimization tagging symbol V^(*),which shows higher image alignment and lower text alignment compared to other methods.In the experiments of Tang,Song and Ming dynasties,the mean values of KID and MMD are relatively low,which indicates that the proposed model has certain feasibility and effectiveness in optimizing the generation of Hanfu renderings.

关键词：服饰效果图汉服图像生成稳定扩散模型文本生成图像

分类号：TS941.2[轻工技术与工程—服装设计与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于稳定扩散模型的汉服效果图生成研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于稳定扩散模型的汉服效果图生成研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索