基于语义空间感知与注意力的文本生成图像方法  

Semantic Spatial Awareness and Attention-based Text-to-Image Generation Method

在线阅读下载全文

作  者:欧阳安杰 孙大盟 何立明[1] OUYANG An-jie;SUN Da-meng;He Li-ming(School of Information Engineering,Chang’an University,Xi’an 710018,China)

机构地区:[1]长安大学信息工程学院,陕西西安710018

出  处:《计算机技术与发展》2025年第3期109-116,共8页Computer Technology and Development

基  金:陕西省重点研发计划项目(2022GY-030,2022GY-039)。

摘  要:文本生成图像任务中存在图像与文本描述不匹配现象以及图像生成质量不佳的现象。为了改善文本与生成图像之间的匹配程度以及更高质量地生成图像,该文提出了一种新颖的生成对抗网络模型(WSA-GAN)。将单词文本编码后的嵌入向量经过交叉注意力方法以及置信度特征融合方法,有效地将单词级语义特征与图像隐藏特征融合。同时引入了语义空间感知卷积模块(SSACN)并对其进行改进,采用深度可分离卷积替代了普通卷积,减少模型参数量,达成改善模型复杂度的目的,并利用自注意力与卷积混合层(ACMix)来捕获图像特征中各个像素之间的关系,在保证模型复杂度的条件下建模特征之间的长距离关系,使得模型能够捕获更广泛的上下文信息,在提高图像质量的同时,提升了文本与生成图像之间的对齐程度。通过在CUB-200-2011数据集上进行验证,对比主流模型,生成质量与文本对齐度均有一定程度的提高。In the task of text image generation,there exist the phenomenon of mismatch between image and text description and the phenomenon of poor image generation quality.In order to improve the matching degree between text and generated images and generate higher quality generated images,a novel generative adversarial network model(WSA-GAN)is proposed.The embedding vector encoded by the word text is fused with the hidden features of the image effectively through the cross-attention method and the confidence feature fusion method.At the same time,the semantic spatial-aware convolution module(SSACN)is introduced and improved,and deep separable convolution is used to replace ordinary convolution to reduce the number of model parameters and achieve the purpose of improving the complexity of the model.Self-attention and convolution mixing(ACMix)is used to capture the relationship between each pixel in the image features,and the long-distance relationship between the features is modeled under the condition of ensuring the complexity of the model,so that the model can capture a wider range of context information,improving the alignment between the text and the generated image while improving the image quality.By verifying on CUB-200-2011 data set,compared to mainstream models,the quality of generation and the alignment with the text have both improved to some extent.

关 键 词:生成对抗网络 多模态融合 注意力机制 文本描述生成图像 深度学习 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象