检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李纪远 管哲予 宋海川 谭鑫 马利庄 LI Jiyuan;GUAN Zheyu;SONG Haichuan;TAN Xin;MA Lizhuang(School of Computer Science and Technology,East China Normal University,Shanghai 200062,China)
机构地区:[1]华东师范大学计算机科学与技术学院,上海200062
出 处:《图学学报》2025年第2期382-392,共11页Journal of Graphics
基 金:国家自然科学基金(62302167,62222602);上海市青年科技英才扬帆计划(23YF1410500)。
摘 要:logo图像相比于其他生成图像类型,有着高度抽象、设计多变、风格统一的特点,因此较难直接控制生成结果。为了实现符合各行业特点、满足多种设计构成形态需要的logo高效生成,提出了一种人在环路的细分领域logo生成方法。首先,基于Dreambooth微调文生图扩散模型,以网络公开资源搜集的logo作为数据集,将文生图模型Stable Diffusion XL作为基座模型训练出适用于基础logo生成的“雏形模型”。然后,构造出多组适用于各目标行业领域的文本提示词库,在提示词库指导下,通过雏形模型对各目标行业的logo进行生成。接着,通过人工介入对生成结果进行筛选,推演构造出符合行业需求的二次数据集。最后,利用得到二次数据集对模型基于LoRA进行迭代微调,得到生成logo的“成品模型”,并通过生成图像与提示词的余弦相似度以及人工问卷指标对成品模型的生成结果进行评估,验证了成品模型生成的logo图像在行业关联度、结构完整性以及美观程度等评价维度上相比于未经过上述处理的原模型直接生成的效果有可观提升。,diversely-designed and unified in styles,making it challenging to directly control the outcome of the generated pictures.In an effort to efficiently generate logos that are in line with the characteristics of various industries and meet the requirements of multiple designs of composition patterns,a Human-in-the-Loop field-specific logo generation method was proposed.Firstly,based on Dreambooth,a method for fine tuning text-to-image diffusion models,and a dataset composed of logos collected from publicly available online sources the text-to-image model Stable Diffusion XL was utilized as the base model and trained to develop a“prototype model”for basic logo generation.Then,groups of lexicons for targeted industries were constructed.The prototype model was then used to generate logos for targeted industries under the guidance of the lexicons.Next,via human intervention,the generated outcomes were filtered into secondary datasets tailored to industry needs.Finally,“prototype model”was iteratively fine-tuned using LoRA and the secondary datasets,obtaining the final model for logo generation.The generated results of the final model were evaluated using cosine similarity between generated images and prompt words,as well as manual questionnaire indicators.The evaluation demonstrated that the logos generated by the final model have a considerable exhibited significant improvements in industry relevance,structural integrity,and aesthetic appearance compared to those generated directly by the untrained base model.
关 键 词:图像生成 扩散模型 人在回路 训练集构造 文本合成图像
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49