Deep Multi-Module Based Language Priors Mitigation Model for Visual Question Answering  

在线阅读下载全文

作  者:于守健 金学勤 吴国文 石秀金 张红 YU Shoujian;JIN Xueqin;WU Guowen;SHI Xiujin;ZHANG Hong(College of Computer Science and Technology,Donghua University,Shanghai 201620,China)

机构地区:[1]College of Computer Science and Technology,Donghua University,Shanghai 201620,China

出  处:《Journal of Donghua University(English Edition)》2023年第6期684-694,共11页东华大学学报(英文版)

摘  要:The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased by some prior knowledge,especially the language priors.This paper proposes a mitigation model called language priors mitigation-VQA(LPM-VQA)for the language priors problem in VQA model,which divides language priors into positive and negative language priors.Different network branches are used to capture and process the different priors to achieve the purpose of mitigating language priors.A dynamically-changing language prior feedback objective function is designed with the intermediate results of some modules in the VQA model.The weight of the loss value for each answer is dynamically set according to the strength of its language priors to balance its proportion in the total VQA loss to further mitigate the language priors.This model does not depend on the baseline VQA architectures and can be configured like a plug-in to improve the performance of the model over most existing VQA models.The experimental results show that the proposed model is general and effective,achieving state-of-the-art accuracy in the VQA-CP v2 dataset.

关 键 词:visual question answering(VQA) language priors natural language processing multimodal fusion computer vision 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP3-05[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象