基于视频描述和阅读理解的视频问答研究  

Research on video question answering based on video description and reading comprehension

在线阅读下载全文

作  者:胡锦祥 孟朝晖[1] Hu Jinxiang;Meng Zhaohui(School of Computer&Information,Hohai University,Nanjing 211100,China)

机构地区:[1]河海大学计算机与信息学院,南京211100

出  处:《计算机应用研究》2021年第12期3781-3785,共5页Application Research of Computers

摘  要:针对大多数视频问答(VideoQA)模型将视频和问题嵌入到同一空间进行答案推理所面临的多模态交互困难、视频语义特征保留能力差等问题,提出了一种视频描述机制来获得视频语义特征的文本表示,从而避免了多模态的交互。提出方法将视频特征通过描述机制得到相应的视频描述文本,并将描述文本特征与问题特征进行阅读理解式的交互与分析,最后推理出问题的答案。在MSVD-QA以及MSRVTT-QA数据集上的测试结果显示,提出问答模型的回答准确率较现有模型均有不同程度的提升,说明所提方法能更好地完成视频问答任务。Most VideoQA models embed video and question into the same space for answer reasoning,and face the problems of multi-modal interaction difficulty and poor retention of video semantic features.This paper proposed a video description mechanism to obtain the text representation of video semantic features,so as to avoid multimodal interaction.The proposed method obtained the corresponding video description text through the description mechanism of the video features,and made the interaction and analysis between the description text features and the question features in the way of reading comprehension,and finally inferved the answer to the question.The test results on MSVD-QA and MSRVTT-QA datasets show that the accuracy of the proposed QA model is improved to some extent compared with the existing models,which indicates that the proposed method can better complete the VideoQA task.

关 键 词:视频问答 视频描述 阅读理解 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象