文本与关键点协同控制的人脸图像生成被引量：1

Facial Image Generation Based on Collaborative Control of Text and Key Points

作　　者：刘宇同王一丁[1] LIU Yu-Tong;WANG Yi-Ding(School of Information Science and Technology,North China University of Technology,Beijing 100144,China)

机构地区：[1]北方工业大学信息学院,北京100144

出　　处：《计算机系统应用》2024年第10期174-182,共9页Computer Systems & Applications

基　　金：国家自然科学基金(62276018)。

摘　　要：人脸图像生成对生成人脸的真实度和可控性有较高要求.本文提出了一种由文本和脸部关键点协同控制的人脸图像生成算法.其中文本主要是在语义层面对生成人脸进行约束;脸部关键点使模型根据给定的脸部位置信息,控制生成人脸的脸型、表情和细节等属性.本文算法在现有的扩散模型基础上加以改进,并额外引入了文本处理模块(CM)、关键点控制网络(KCN)和自编码网络(ACN).其中,扩散模型是一种基于扩散理论的噪声推理算法;CM基于注意力机制设计,可以对文本信息进行编码和存储;KCN接收的是关键点的位置信息,使生成人脸的可控性得以增强;ACN缓解了扩散模型的生成压力,减少生成样本所需的时间.此外,为了适配人脸图像这一生成任务,我们构建一个包含30000张人脸图像的数据集.本文算法实现了:给定一段先决条件文本和一张人脸关键点图,模型可以提取出文本中的特征信息和关键点的位置信息,生成高真实度和可控性强的目标人脸图像.通过与目前主流方法进行对比,本文算法的FID指标提高了约5%–23%,IS指标提高了约3%–14%,证明了算法的先进性和优越性.Face image generation requires high realism and controllability.This study proposes an algorithm for face image generation that is jointly controlled by text and facial key points.The text constrains the generation of faces at a semantic level,while facial key points enable the model to control the generation of facial features,expressions,and details based on given facial information.The proposed algorithm improves the existing diffusion model and introduces additional components:text processing models(CM),keypoint control networks(KCN),and autoencoder networks(ACN).Specifically,the diffusion model is a noise inference algorithm based on the diffusion theory;CM is designed based on an attention mechanism to encode and store text information;KCN receives the location information of key points,enhancing the controllability of face generation;ACN alleviates the generation pressure of the diffusion model and reduces the time required to generate samples.In addition,to adapt to generating face images,this research constructs a dataset containing 30000 face images.In the proposed algorithm,given prerequisite text and a facial keypoint image,the model extracts feature information and keypoint information from the text,generating a highly realistic and controllable target face image.Compared with mainstream methods,the proposed algorithm improves the FID index by about 5%-23%and the IS index by about 3%-14%,which proves its superiority.

关键词：人脸生成扩散模型生成式人工智能文本编码自动编码器

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

文本与关键点协同控制的人脸图像生成被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

文本与关键点协同控制的人脸图像生成 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

文本与关键点协同控制的人脸图像生成被引量：1