融合视觉定位信息的视觉问答算法研究

Research on Visual Question Answering Fusing Visual Grounding Information

作　　者：吴金蔓车进[1,3] 白雪冰陈玉敏[1,3] WU Jinman;CHE Jin;BAI Xuebing;Chen Yumin(School of Electronic-Electrical Engineering,Ningxia University,Yinchuan 750021,China;School of Advanced Interdisciplinary,Ningxia University,Yinchuan 750021,China;Ningxia Key Laboratory of Intelligent Sensing for Desert Infomation,Yinchuan 750021,China)

机构地区：[1]宁夏大学电子与电气工程学院,宁夏银川750021 [2]宁夏大学前沿交叉学院,宁夏银川750021 [3]宁夏沙漠信息智能感知重点实验室,宁夏银川750021

出　　处：《长江信息通信》2024年第5期1-4,共4页Changjiang Information & Communications

基　　金：国家自然科学基金(No.62366042);宁夏自然科学基金(No.2023AAC03127)

摘　　要：为提高视觉问答模型对图像中相关信息的捕捉,引入了视觉定位信息,以增强模型对完整图像信息的理解。通过将图像语义特征与浅层文本特征一同输入以图像为基础的文本编码器,将文本特征映射到图像空间。随后,将得到的文本特征和图像特征输入以文本为基础的图像解码器,生成视觉定位信息。实验结果显示,模型在Accuracy、Open、Binary、Consistency这四项评价指标上均取得最佳成绩,分别提高了0.84%、0.74%、3.38%、2.95%。其中,Accuracy达到了56.94%。这表明视觉定位信息有效地增强了图像特征中与问题相关部分的信息比例。To enhance the capture of relevant information in images by a Visual Question An-swering(VQA)model,Visual Grounding(VG)information is introduced to augment the models understanding of the complete image context.This involves integrating semantic features from the image and shallow textual features into an image-based text encoder,mapping textual fea-tures to the image space.Subsequently,the obtained textual features and image features are fed into a text-based image decoder to generate VG information.Experimental results demonstrate that the model achicves the best performance across four evaluation metrics:Accuracy,Open,Bi-nary,and Consistency,with improvements of 0.84%,0.74%,3.38%,and 2.95%respective-ly.Specifically,Accuracy reaches 56.94%,indicating that VG information effectively enhances the proportion of information related to the question in the image features.

关键词：视觉问答视觉定位门控机制编码器解码器

分类号：TP389.1[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合视觉定位信息的视觉问答算法研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合视觉定位信息的视觉问答算法研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索