基于门机制注意力模型的文本生成图像方法被引量：1

Text-to-Image Method Based on Attention Model with Increased Gate Mechanism

作　　者：陈积泽姜晓燕高永彬 CHEN Jize;JIANG Xiaoyan;GAO Yongbin(School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)

机构地区：[1]上海工程技术大学电子电气工程学院,上海201620

出　　处：《计算机工程与应用》2023年第12期208-216,共9页Computer Engineering and Applications

基　　金：国家自然科学基金重点项目(U2033218)。

摘　　要：针对传统文本生成图像方法存在生成图像局部纹理单一、边缘细节不清晰和不符合输入文本描述等问题,提出一种门机制注意力模型的文本生成图像方法RAGAN。针对传统方法无法生成细粒度图像的问题,使用增加门机制的注意力模型网络筛选出相关的词向量,并与中间隐藏向量相结合形成新的隐藏向量,再通过生成对抗网络的相互博弈让生成器生成纹理更加丰富、目标物体边缘更加清晰的图像,从而提高图像质量;针对生成图像不符合输入文本描述的问题,使用文本重构提取生成图像中蕴含的深层次的语义特征,与输入文本的语义特征进行对比,通过定义重构损失提高语义一致性。相比于基准模型,在CUB数据集上的Inception Score与R-precision分别提高了9.17%和8.3%,在COCO数据集上的Inception Score与R-precision分别提高了13.67%和5.56%,证明了该模型在保持语义一致性的同时,有效提高了生成图像的真实性和艺术性。Aiming at the problems in text-to-image such as single local texture,unclear edge details and non-conformity to the input text description,RAGAN is proposed as a text-to-image method based on attention model with increased gate mechanism.To address the problem that the traditional method cannot generate fine-grained images,an attention model network with an added gate mechanism is used to filter out relevant word vectors and combine them with intermediate hidden vectors to form new hidden vectors,and then the mutual game of the generative adversarial network allows the generator to generate images with richer textures and clearer edges of the target objects,thus improving the image quality.To address the problem that the generated images do not match the input text description,text reconstruction is used to extract the deep semantic features embedded in the generated images and compare them with the semantic features of the input text to improve the semantic consistency by defining the reconstruction loss.Compared to the baseline model,the Inception Score and R-precision on the CUB dataset improved by 9.17%and 8.3%respectively,and the Inception Score and R-precision on the COCO dataset improved by 13.67%and 5.56%respectively,demonstrating that the model in this paper is effective in improving the authenticity and artistry of the generated images while maintaining semantic consistency.

关键词：注意力机制卷积神经网络生成对抗网络深度学习

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于门机制注意力模型的文本生成图像方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于门机制注意力模型的文本生成图像方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于门机制注意力模型的文本生成图像方法被引量：1