基于自注意力机制的文本生成单目标图像方法被引量：7

Text-to-Single Image Method Based on Self-Attention

作　　者：鞠思博徐晶[1] 李岩芳[1] JU Sibo;XU Jing;LI Yanfang(School of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China)

机构地区：[1]长春理工大学计算机科学技术学院,长春130022

出　　处：《计算机工程与应用》2022年第3期249-258,共10页Computer Engineering and Applications

基　　金：中国工程院院地合作项目(2019-JL-4-2);吉林省科技发展计划项目(20170307002GX)。

摘　　要：基于自然语言描述的图像合成已成为人工智能领域中的研究热点。借助生成对抗网络,该领域在高分辨率图像合成方面取得了长足的发展。然而,合成单目标图像在真实性上仍存在一定缺陷,如针对鸟类图形合成时,会出现“多头”“多嘴”等异常情况。针对此类问题,提出基于自注意力机制的文本生成单目标模型SA-AttnGAN。SA-AttnGAN将文本特征细化为单词特征与句子特征,提高文本-图像的语义对齐性;在AttnGAN初始化阶段,使用自注意力机制,提升文本生成图像模型的稳定性;利用多阶段GAN网络叠加,最终合成高分辨图像。实验数据表明,SA-AttnGAN在Inception Score与Frechet Inception Distance指标得分上优于其他对比模型;合成图像分析表明,本模型不仅可以学习到背景与颜色信息,也能够正确捕捉鸟类头部、嘴部等组成部分的结构性信息,改善Attn-GAN模型生成“多头”“多嘴”等错误图像情况。此外,SA-AttnGAN成功地应用于基于中文描述的服装图像合成,具有良好的泛化能力。Text-to-image is drawing increasing attention in artificial intelligence field.Benefited from the GANs,it has made a remarkable improvement on high-resolution image synthesis.However,there are still some shortages in natural representation for single-target synthesis,such as the abnormal composition in bird images.To address this issue,the SA-AttnGAN is proposed as a single-target model of text generation based on self-attention mechanism.To improve semantic alignment of text and image,it refines the text vectors into the features in both word-level and sentence-level.The self-attention is applied in the initial stage of AttnGAN to increase the stability during image generation.Multi-stage GANs is adopted to synthesize the images in high-resolution.Experiments show the proposed work outperforms other models on Inception Score and Frechet Inception Distance.Synthesis image analysis demonstrates SA-AttnGAN succeeds in learning background and color information,capturing the correct composition of bird’s head,mouth and other parts,and effectively alleviating the problem of“multi-head”and“multi-mouth”occurred in AttnGAN.Additionally,SA-AttnGAN is successfully extended to synthesize clothing images with Chinese description,which shows the adaptation and generalization of this model.

关键词：文本生成图像生成对抗网络深度学习计算机视觉人工智能

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于自注意力机制的文本生成单目标图像方法被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于自注意力机制的文本生成单目标图像方法 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于自注意力机制的文本生成单目标图像方法被引量：7