Relevant Visual Semantic Context-Aware Attention-Based Dialog

作　　者：Eugene Tan Boon Hong Yung-Wey Chong Tat-Chee Wan Kok-Lim Alvin Yau

机构地区：[1]National Advanced IPv6 Centre,Universiti Sains Malaysia,Penang,Malaysia [2]Lee Kong Chian Faculty of Engineering and Science(LKCFES),Universiti Tunku Abdul Rahman,Sungai Long,Selangor,Malaysia

出　　处：《Computers, Materials & Continua》2023年第8期2337-2354,共18页计算机、材料和连续体（英文）

摘　　要：The existing dataset for visual dialog comprises multiple rounds of questions and a diverse range of image contents.However,it faces challenges in overcoming visual semantic limitations,particularly in obtaining sufficient context from visual and textual aspects of images.This paper proposes a new visual dialog dataset called Diverse History-Dialog(DS-Dialog)to address the visual semantic limitations faced by the existing dataset.DS-Dialog groups relevant histories based on their respective Microsoft Common Objects in Context(MSCOCO)image categories and consolidates them for each image.Specifically,each MSCOCO image category consists of top relevant histories extracted based on their semantic relationships between the original image caption and historical context.These relevant histories are consolidated for each image,and DS-Dialog enhances the current dataset by adding new context-aware relevant history to provide more visual semantic context for each image.The new dataset is generated through several stages,including image semantic feature extraction,keyphrase extraction,relevant question extraction,and relevant history dialog generation.The DS-Dialog dataset contains about 2.6 million question-answer pairs,where 1.3 million pairs correspond to existing VisDial’s question-answer pairs,and the remaining 1.3 million pairs include a maximum of 5 image features for each VisDial image,with each image comprising 10-round relevant question-answer pairs.Moreover,a novel adaptive relevant history selection is proposed to resolve missing visual semantic information for each image.DS-Dialog is used to benchmark the performance of previous visual dialog models and achieves better performance than previous models.Specifically,the proposed DSDialog model achieves an 8% higher mean reciprocal rank(MRR),11% higher R@1%,6% higher R@5%,5% higher R@10%,and 8% higher normalized discounted cumulative gain(NDCG)compared to LF.DS-Dialog also achieves approximately 1 point improvement on R@k,mean,MRR,and NDCG compared to the origin

关键词：Visual dialog CONTEXT-AWARE relevant history computer vision natural language processing

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Relevant Visual Semantic Context-Aware Attention-Based Dialog

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Relevant Visual Semantic Context-Aware Attention-Based Dialog

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索