检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄万鑫 芦天亮[1] 袁梦娇 耿浩琦 陈咏豪 HUANG Wanxin;LU Tianliang;YUAN Mengjiao;GENG Haoqi;CHEN Yonghao(School of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China)
机构地区:[1]中国人民公安大学信息网络安全学院,北京100038
出 处:《中国人民公安大学学报(自然科学版)》2025年第1期69-81,共13页Journal of People’s Public Security University of China(Science and Technology)
基 金:中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07)。
摘 要:文本生成人脸是根据文本描述创造特定人脸图像的技术,在刑事侦查、虚拟现实等领域有巨大应用前景。当前文本生成人脸主流方法存在人脸图文匹配度低、人脸图片风格多样难控制、研发成本高等问题。针对上述挑战,对扩散模型融合生成对抗网络进行探究,融合多模型改进文本生成人脸。首先采用多种方法微调扩散模型,提升通用大模型领域定制能力与图文理解能力;然后提出VA E-InverseGAN解码器,将扩散模型输出隐变量映射到StyleGAN2丰富人脸先验空间来生成高质量人脸。经系列定性和定量分析,此方法在M M-Celeb A-HQ数据集上实现风格可控性提升,人脸图文特征理解比基线模型提升12.0%,人脸图文匹配指标CLI P-Score提升33.8%,人脸图像质量指标NIQE优化4.0%。Text-to-Face Generation is a technique that creates specific facial images based on textual descriptions,with significant potential applications in the fields of criminal investigation and virtual reality.However,current mainstream text-to-face generation methods face several challenges,including low face-text matching accuracy,difficulty in controlling the diverse styles of generated facial images,and high development costs.To address these challenges,the fusion of diffusion models with generative adversarial networks(GANs)is investigated and a multi-model fusion approach is proposed to improve text-to-face generation.First,the diffusion model using multiple techniques is fine-tuned to enhance its text-image understanding capabilities and improve its customization ability in the context of large general models.Then,the VAE-InverseGAN Decoder is introduced,which maps the latent variables output by the diffusion model to the StyleGAN2's rich facial prior space to generate high-quality facial images.Through a series of qualitative and quantitative analyses,this method demonstrates improved style controllability on the MM-CelebA-HQ dataset.Compared to baseline models,the proposed method improves facial text-feature comprehension by 12.0%,increases the face-text matching metric CLIP-Score by 33.8%,and enhances the image quality metric NIQE by 4.0%.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49