检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈积泽 姜晓燕 高永彬 CHEN Jize;JIANG Xiaoyan;GAO Yongbin(School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)
机构地区:[1]上海工程技术大学电子电气工程学院,上海201620
出 处:《计算机工程与应用》2023年第12期208-216,共9页Computer Engineering and Applications
基 金:国家自然科学基金重点项目(U2033218)。
摘 要:针对传统文本生成图像方法存在生成图像局部纹理单一、边缘细节不清晰和不符合输入文本描述等问题,提出一种门机制注意力模型的文本生成图像方法RAGAN。针对传统方法无法生成细粒度图像的问题,使用增加门机制的注意力模型网络筛选出相关的词向量,并与中间隐藏向量相结合形成新的隐藏向量,再通过生成对抗网络的相互博弈让生成器生成纹理更加丰富、目标物体边缘更加清晰的图像,从而提高图像质量;针对生成图像不符合输入文本描述的问题,使用文本重构提取生成图像中蕴含的深层次的语义特征,与输入文本的语义特征进行对比,通过定义重构损失提高语义一致性。相比于基准模型,在CUB数据集上的Inception Score与R-precision分别提高了9.17%和8.3%,在COCO数据集上的Inception Score与R-precision分别提高了13.67%和5.56%,证明了该模型在保持语义一致性的同时,有效提高了生成图像的真实性和艺术性。Aiming at the problems in text-to-image such as single local texture,unclear edge details and non-conformity to the input text description,RAGAN is proposed as a text-to-image method based on attention model with increased gate mechanism.To address the problem that the traditional method cannot generate fine-grained images,an attention model network with an added gate mechanism is used to filter out relevant word vectors and combine them with intermediate hidden vectors to form new hidden vectors,and then the mutual game of the generative adversarial network allows the generator to generate images with richer textures and clearer edges of the target objects,thus improving the image quality.To address the problem that the generated images do not match the input text description,text reconstruction is used to extract the deep semantic features embedded in the generated images and compare them with the semantic features of the input text to improve the semantic consistency by defining the reconstruction loss.Compared to the baseline model,the Inception Score and R-precision on the CUB dataset improved by 9.17%and 8.3%respectively,and the Inception Score and R-precision on the COCO dataset improved by 13.67%and 5.56%respectively,demonstrating that the model in this paper is effective in improving the authenticity and artistry of the generated images while maintaining semantic consistency.
关 键 词:注意力机制 卷积神经网络 生成对抗网络 深度学习
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222