检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡锦祥 孟朝晖[1] Hu Jinxiang;Meng Zhaohui(School of Computer&Information,Hohai University,Nanjing 211100,China)
出 处:《计算机应用研究》2021年第12期3781-3785,共5页Application Research of Computers
摘 要:针对大多数视频问答(VideoQA)模型将视频和问题嵌入到同一空间进行答案推理所面临的多模态交互困难、视频语义特征保留能力差等问题,提出了一种视频描述机制来获得视频语义特征的文本表示,从而避免了多模态的交互。提出方法将视频特征通过描述机制得到相应的视频描述文本,并将描述文本特征与问题特征进行阅读理解式的交互与分析,最后推理出问题的答案。在MSVD-QA以及MSRVTT-QA数据集上的测试结果显示,提出问答模型的回答准确率较现有模型均有不同程度的提升,说明所提方法能更好地完成视频问答任务。Most VideoQA models embed video and question into the same space for answer reasoning,and face the problems of multi-modal interaction difficulty and poor retention of video semantic features.This paper proposed a video description mechanism to obtain the text representation of video semantic features,so as to avoid multimodal interaction.The proposed method obtained the corresponding video description text through the description mechanism of the video features,and made the interaction and analysis between the description text features and the question features in the way of reading comprehension,and finally inferved the answer to the question.The test results on MSVD-QA and MSRVTT-QA datasets show that the accuracy of the proposed QA model is improved to some extent compared with the existing models,which indicates that the proposed method can better complete the VideoQA task.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222