基于视频描述和阅读理解的视频问答研究

Research on video question answering based on video description and reading comprehension

作　　者：胡锦祥孟朝晖[1] Hu Jinxiang;Meng Zhaohui(School of Computer&Information,Hohai University,Nanjing 211100,China)

出　　处：《计算机应用研究》2021年第12期3781-3785,共5页Application Research of Computers

摘　　要：针对大多数视频问答(VideoQA)模型将视频和问题嵌入到同一空间进行答案推理所面临的多模态交互困难、视频语义特征保留能力差等问题,提出了一种视频描述机制来获得视频语义特征的文本表示,从而避免了多模态的交互。提出方法将视频特征通过描述机制得到相应的视频描述文本,并将描述文本特征与问题特征进行阅读理解式的交互与分析,最后推理出问题的答案。在MSVD-QA以及MSRVTT-QA数据集上的测试结果显示,提出问答模型的回答准确率较现有模型均有不同程度的提升,说明所提方法能更好地完成视频问答任务。Most VideoQA models embed video and question into the same space for answer reasoning,and face the problems of multi-modal interaction difficulty and poor retention of video semantic features.This paper proposed a video description mechanism to obtain the text representation of video semantic features,so as to avoid multimodal interaction.The proposed method obtained the corresponding video description text through the description mechanism of the video features,and made the interaction and analysis between the description text features and the question features in the way of reading comprehension,and finally inferved the answer to the question.The test results on MSVD-QA and MSRVTT-QA datasets show that the accuracy of the proposed QA model is improved to some extent compared with the existing models,which indicates that the proposed method can better complete the VideoQA task.

关键词：视频问答视频描述阅读理解

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于视频描述和阅读理解的视频问答研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于视频描述和阅读理解的视频问答研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索