检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张浩萌 刘斌 ZHANG Haomeng;LIU Bin(College of Computer Science and Technology,Nanjing Tech University,Nanjing 211816,China)
机构地区:[1]南京工业大学计算机科学与技术学院,南京211816
出 处:《小型微型计算机系统》2024年第2期470-476,共7页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61672279)资助。
摘 要:视频描述是一项同时涉及到计算机视觉和自然语言处理两个领域的跨模态任务,其目的是为视频自动生成一段描述,所生成的内容不仅要准确完整地描述视频的主要内容,而且要符合基本的语法结构.针对现有的视频描述方法在生成过程的可解释性和生成内容的准确性等方面尚存在一些不足之处,本文提出一种基于编解码框架的融合语义信息和视觉推理特征的视频描述方法,该方法在解码阶段进行适当的改进,提出3种特征融合网络,分别为特征参与的融合网络、特征引导的融合网络以及结合权重的融合网络,将视频对应的语义特征与视觉推理特征进行融合,从而生成兼具可解释性和准确性的描述.在MSVD和MSRVTT两个数据集上进行消融和对比实验的结果表明:与基模型相比,本文所提方法的CIDEr指标分别增长了21.6%和3.5%;与其他方法的比较结果表明,本文提出的方法在各个指标上具有一定的竞争力.Video captioning is a cross-modal task involving both computer vision and natural language processing.Its purpose is to automatically generate a description for the video.The generated content must not only accurately and completely describe the main content of the video,but also conform to the basic grammatical structure.Aiming at the shortcomings of the existing video captioning methods in the interpretability of the generation process and the accuracy of the generated content,this paper proposes a video captioning method based on the encoder-decoder framework that fuses semantic information and visual reasoning features.This method makes appropriate improvements in the decoding stage,and proposes three feature fusion networks to fuse the semantic features corresponding to the video with visual reasoning features,namely,a feature-involved fusion network,a feature-guided fusion network,and a weighted fusion network.The result is a description that is both interpretable and accurate.The results of ablation and comparison experiments on MSVD and MSRVTT datasets show that:compared with the base model,the CIDEr index of the proposed method has increased by 21.6% and 3.5%,respectively;the comparison with other methods shows that,the method proposed in this paper has certain competitiveness in each index.
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249