改进多模型融合的文本生成人脸方法  

Improved Text-to-Face Generation Method Based on Multi-Model Fusion

在线阅读下载全文

作  者:黄万鑫 芦天亮[1] 袁梦娇 耿浩琦 陈咏豪 HUANG Wanxin;LU Tianliang;YUAN Mengjiao;GENG Haoqi;CHEN Yonghao(School of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China)

机构地区:[1]中国人民公安大学信息网络安全学院,北京100038

出  处:《中国人民公安大学学报(自然科学版)》2025年第1期69-81,共13页Journal of People’s Public Security University of China(Science and Technology)

基  金:中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07)。

摘  要:文本生成人脸是根据文本描述创造特定人脸图像的技术,在刑事侦查、虚拟现实等领域有巨大应用前景。当前文本生成人脸主流方法存在人脸图文匹配度低、人脸图片风格多样难控制、研发成本高等问题。针对上述挑战,对扩散模型融合生成对抗网络进行探究,融合多模型改进文本生成人脸。首先采用多种方法微调扩散模型,提升通用大模型领域定制能力与图文理解能力;然后提出VA E-InverseGAN解码器,将扩散模型输出隐变量映射到StyleGAN2丰富人脸先验空间来生成高质量人脸。经系列定性和定量分析,此方法在M M-Celeb A-HQ数据集上实现风格可控性提升,人脸图文特征理解比基线模型提升12.0%,人脸图文匹配指标CLI P-Score提升33.8%,人脸图像质量指标NIQE优化4.0%。Text-to-Face Generation is a technique that creates specific facial images based on textual descriptions,with significant potential applications in the fields of criminal investigation and virtual reality.However,current mainstream text-to-face generation methods face several challenges,including low face-text matching accuracy,difficulty in controlling the diverse styles of generated facial images,and high development costs.To address these challenges,the fusion of diffusion models with generative adversarial networks(GANs)is investigated and a multi-model fusion approach is proposed to improve text-to-face generation.First,the diffusion model using multiple techniques is fine-tuned to enhance its text-image understanding capabilities and improve its customization ability in the context of large general models.Then,the VAE-InverseGAN Decoder is introduced,which maps the latent variables output by the diffusion model to the StyleGAN2's rich facial prior space to generate high-quality facial images.Through a series of qualitative and quantitative analyses,this method demonstrates improved style controllability on the MM-CelebA-HQ dataset.Compared to baseline models,the proposed method improves facial text-feature comprehension by 12.0%,increases the face-text matching metric CLIP-Score by 33.8%,and enhances the image quality metric NIQE by 4.0%.

关 键 词:文本生成人脸 扩散模型 大模型微调 多模型融合 多模态 

分 类 号:D035.39[政治法律—政治学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象