Coreference resolution helps visual dialogs to focus

作　　者：Tianwei Yue Wenping Wang Chen Liang Dachi Chen Congrui Hetang Xuewei Wang

机构地区：[1]Carnegie Mellon University,Pittsburgh 15213,USA

出　　处：《High-Confidence Computing》2024年第2期129-135,共7页高置信计算（英文）

摘　　要：Visual Dialog is a multi-modal task involving both computer vision and dialog systems.The goal is to answer multiple questions in conversation style,given an image as the context.Neural networks with attention modules are widely used for this task,because of their effectiveness in reasoning the relevance between the texts and images.In this work,we study how to further improve the quality of such reasoning,which is an open challenge.Our baseline is the Recursive Visual Attention(RVA)model,which refines the vision-text attention by iteratively visiting the dialog history.Building on top of that,we propose to improve the attention mechanism with contrastive learning.We train a Matching-Aware Attention Kernel(MAAK)by aligning the deep feature embeddings of an image and its caption,to provide better attention scores.Experiments show consistent improvements from MAAK.In addition,we study the effect of using Multimodal Compact Bilinear(MCB)pooling as a three-way feature fusion for the visual,textual and dialog history embeddings.We analyze the performance of both methods in the discussion section,and propose further ideas to resolve current limitations.

关键词：Multi-model machine learning Visual dialog Co-reference resolution

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Coreference resolution helps visual dialogs to focus

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Coreference resolution helps visual dialogs to focus

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索