检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:鞠思博 徐晶[1] 李岩芳[1] JU Sibo;XU Jing;LI Yanfang(School of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China)
机构地区:[1]长春理工大学计算机科学技术学院,长春130022
出 处:《计算机工程与应用》2022年第3期249-258,共10页Computer Engineering and Applications
基 金:中国工程院院地合作项目(2019-JL-4-2);吉林省科技发展计划项目(20170307002GX)。
摘 要:基于自然语言描述的图像合成已成为人工智能领域中的研究热点。借助生成对抗网络,该领域在高分辨率图像合成方面取得了长足的发展。然而,合成单目标图像在真实性上仍存在一定缺陷,如针对鸟类图形合成时,会出现“多头”“多嘴”等异常情况。针对此类问题,提出基于自注意力机制的文本生成单目标模型SA-AttnGAN。SA-AttnGAN将文本特征细化为单词特征与句子特征,提高文本-图像的语义对齐性;在AttnGAN初始化阶段,使用自注意力机制,提升文本生成图像模型的稳定性;利用多阶段GAN网络叠加,最终合成高分辨图像。实验数据表明,SA-AttnGAN在Inception Score与Frechet Inception Distance指标得分上优于其他对比模型;合成图像分析表明,本模型不仅可以学习到背景与颜色信息,也能够正确捕捉鸟类头部、嘴部等组成部分的结构性信息,改善Attn-GAN模型生成“多头”“多嘴”等错误图像情况。此外,SA-AttnGAN成功地应用于基于中文描述的服装图像合成,具有良好的泛化能力。Text-to-image is drawing increasing attention in artificial intelligence field.Benefited from the GANs,it has made a remarkable improvement on high-resolution image synthesis.However,there are still some shortages in natural representation for single-target synthesis,such as the abnormal composition in bird images.To address this issue,the SA-AttnGAN is proposed as a single-target model of text generation based on self-attention mechanism.To improve semantic alignment of text and image,it refines the text vectors into the features in both word-level and sentence-level.The self-attention is applied in the initial stage of AttnGAN to increase the stability during image generation.Multi-stage GANs is adopted to synthesize the images in high-resolution.Experiments show the proposed work outperforms other models on Inception Score and Frechet Inception Distance.Synthesis image analysis demonstrates SA-AttnGAN succeeds in learning background and color information,capturing the correct composition of bird’s head,mouth and other parts,and effectively alleviating the problem of“multi-head”and“multi-mouth”occurred in AttnGAN.Additionally,SA-AttnGAN is successfully extended to synthesize clothing images with Chinese description,which shows the adaptation and generalization of this model.
关 键 词:文本生成图像 生成对抗网络 深度学习 计算机视觉 人工智能
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222