改进多模型融合的文本生成人脸方法

Improved Text-to-Face Generation Method Based on Multi-Model Fusion

作　　者：黄万鑫芦天亮[1] 袁梦娇耿浩琦陈咏豪 HUANG Wanxin;LU Tianliang;YUAN Mengjiao;GENG Haoqi;CHEN Yonghao(School of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China)

机构地区：[1]中国人民公安大学信息网络安全学院,北京100038

出　　处：《中国人民公安大学学报(自然科学版)》2025年第1期69-81,共13页Journal of People’s Public Security University of China(Science and Technology)

基　　金：中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07)。

摘　　要：文本生成人脸是根据文本描述创造特定人脸图像的技术,在刑事侦查、虚拟现实等领域有巨大应用前景。当前文本生成人脸主流方法存在人脸图文匹配度低、人脸图片风格多样难控制、研发成本高等问题。针对上述挑战,对扩散模型融合生成对抗网络进行探究,融合多模型改进文本生成人脸。首先采用多种方法微调扩散模型,提升通用大模型领域定制能力与图文理解能力;然后提出VA E-InverseGAN解码器,将扩散模型输出隐变量映射到StyleGAN2丰富人脸先验空间来生成高质量人脸。经系列定性和定量分析,此方法在M M-Celeb A-HQ数据集上实现风格可控性提升,人脸图文特征理解比基线模型提升12.0%,人脸图文匹配指标CLI P-Score提升33.8%,人脸图像质量指标NIQE优化4.0%。Text-to-Face Generation is a technique that creates specific facial images based on textual descriptions,with significant potential applications in the fields of criminal investigation and virtual reality.However,current mainstream text-to-face generation methods face several challenges,including low face-text matching accuracy,difficulty in controlling the diverse styles of generated facial images,and high development costs.To address these challenges,the fusion of diffusion models with generative adversarial networks(GANs)is investigated and a multi-model fusion approach is proposed to improve text-to-face generation.First,the diffusion model using multiple techniques is fine-tuned to enhance its text-image understanding capabilities and improve its customization ability in the context of large general models.Then,the VAE-InverseGAN Decoder is introduced,which maps the latent variables output by the diffusion model to the StyleGAN2's rich facial prior space to generate high-quality facial images.Through a series of qualitative and quantitative analyses,this method demonstrates improved style controllability on the MM-CelebA-HQ dataset.Compared to baseline models,the proposed method improves facial text-feature comprehension by 12.0%,increases the face-text matching metric CLIP-Score by 33.8%,and enhances the image quality metric NIQE by 4.0%.

关键词：文本生成人脸扩散模型大模型微调多模型融合多模态

分类号：D035.39[政治法律—政治学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进多模型融合的文本生成人脸方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进多模型融合的文本生成人脸方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索