检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴金蔓 车进[1,3] 白雪冰 陈玉敏[1,3] WU Jinman;CHE Jin;BAI Xuebing;Chen Yumin(School of Electronic-Electrical Engineering,Ningxia University,Yinchuan 750021,China;School of Advanced Interdisciplinary,Ningxia University,Yinchuan 750021,China;Ningxia Key Laboratory of Intelligent Sensing for Desert Infomation,Yinchuan 750021,China)
机构地区:[1]宁夏大学电子与电气工程学院,宁夏银川750021 [2]宁夏大学前沿交叉学院,宁夏银川750021 [3]宁夏沙漠信息智能感知重点实验室,宁夏银川750021
出 处:《长江信息通信》2024年第5期1-4,共4页Changjiang Information & Communications
基 金:国家自然科学基金(No.62366042);宁夏自然科学基金(No.2023AAC03127)
摘 要:为提高视觉问答模型对图像中相关信息的捕捉,引入了视觉定位信息,以增强模型对完整图像信息的理解。通过将图像语义特征与浅层文本特征一同输入以图像为基础的文本编码器,将文本特征映射到图像空间。随后,将得到的文本特征和图像特征输入以文本为基础的图像解码器,生成视觉定位信息。实验结果显示,模型在Accuracy、Open、Binary、Consistency这四项评价指标上均取得最佳成绩,分别提高了0.84%、0.74%、3.38%、2.95%。其中,Accuracy达到了56.94%。这表明视觉定位信息有效地增强了图像特征中与问题相关部分的信息比例。To enhance the capture of relevant information in images by a Visual Question An-swering(VQA)model,Visual Grounding(VG)information is introduced to augment the models understanding of the complete image context.This involves integrating semantic features from the image and shallow textual features into an image-based text encoder,mapping textual fea-tures to the image space.Subsequently,the obtained textual features and image features are fed into a text-based image decoder to generate VG information.Experimental results demonstrate that the model achicves the best performance across four evaluation metrics:Accuracy,Open,Bi-nary,and Consistency,with improvements of 0.84%,0.74%,3.38%,and 2.95%respective-ly.Specifically,Accuracy reaches 56.94%,indicating that VG information effectively enhances the proportion of information related to the question in the image features.
分 类 号:TP389.1[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7