检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李智[1] 陈郁 LI Zhi;CHEN Yu(School of Textiles and Fashion,Shanghai University of Engineering Science,Shanghai 201620,China)
机构地区:[1]上海工程技术大学纺织服装学院,上海201620
出 处:《北京服装学院学报(自然科学版)》2024年第4期90-97,共8页Journal of Beijing Institute of Fashion Technology:Natural Science Edition
基 金:上海市教育委员会东方学者项目(TP2017074)。
摘 要:针对汉服效果图生成过程中因各朝代服饰特征难以被准确捕捉而造成生成图像朝代混淆的问题,本文基于稳定扩散模型(Stable Diffusion),根据新输入的文本提示词匹配文本与图像特征空间向量,将V^(*)作为新标记符号嵌入层,并协同交叉注意力层参数W k和W v进行联合优化,最终搜索模型再学习新服饰文本特征后的损失函数最小值。通过查阅文献史料,收集整理并新增了唐、宋、明3个朝代163个服饰文本提示词。观察生成的汉服效果图,该模型能根据文本提示词生成符合朝代特征的服饰图像,较未融合汉服模型特征的3种常用文本生成图像算法,其生成的图像更为清晰且高质。在消融实验中,该模型采用特定ID优化标记符号V^(*),与其他方式相比,具有较高的图像对齐度和较低的文本对齐度。在唐、宋、明3个朝代的实验中,KID值和MMD值的均值都相对较低,表明本模型在优化汉服效果图生成方面具有一定的可行性和有效性。Aiming at the problem of confusion of dynasties in the image generation of Hanfu renderings due to the difficulty in accurately capturing the costume features of each dynasty,based on the Stable Diffusion model,the text and image feature space vectors are matched according to the newly input text prompt words,V^(*)is used as the new marker symbol embedding layer,and the cross-attention layer parameters W k and W v are jointly optimized,ultimately minimizing the loss function of the model after learning new clothing text features.Through consulting the literature and historical materials,163 text prompts related to clothing from the Tang,Song,and Ming dynasties were collected and organized.The generated Hanfu effect images demonstrate that the model can create garment images that correspond to the specific characteristics of each dynasty based on the text prompts words.Compared to three commonly used text-to-image generation algorithms that do not integrate Hanfu model features,the images generated by this method are clearer and of higher quality.In ablation experiments,the model employs the specific ID optimization tagging symbol V^(*),which shows higher image alignment and lower text alignment compared to other methods.In the experiments of Tang,Song and Ming dynasties,the mean values of KID and MMD are relatively low,which indicates that the proposed model has certain feasibility and effectiveness in optimizing the generation of Hanfu renderings.
关 键 词:服饰效果图 汉服 图像生成 稳定扩散模型 文本生成图像
分 类 号:TS941.2[轻工技术与工程—服装设计与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3