基于多重对比学习的两阶段视频片段检索

Two-stage video moment retrieval with multiple contrastive learning

作　　者：阎刚王浩天 YAN Gang;WANG Haotian(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China)

机构地区：[1]河北工业大学人工智能与数据科学学院,天津300401

出　　处：《河北工业大学学报》2025年第2期32-41,共10页Journal of Hebei University of Technology

基　　金：国家自然科学基金资助项目(62102129)。

摘　　要：随着视频资源日益丰富,跨模态视频片段检索的研究逐渐兴起,由于视频和文本来自不同的特征空间,如何学习公共特征空间解决数据间的语义鸿沟成为关键问题。现有方法利用跨模态编码器将不同模态的信息进行特征对齐,但是同一视频中的多个片段会产生相互干扰,导致视频表征过于粗糙。又由于跨模态编码器的计算量过大,导致检索时间过长。针对这2个问题,提出了一种基于多重对比学习的两阶段视频片段检索网络(MCLNet),该模型通过视频级、片段级对比学习和视频模态内对比学习,优化特征对齐,减少干扰,解决了视频表征过于粗糙的问题。另外,该模型利用两阶段方法将视频检索和时刻定位任务分为两阶段执行,使得视频可在第一阶段进行预编码存储,解决了模型检索时间过长的问题。在TVR、DiDeMo 2个视频片段检索数据集上的实验结果表明了MCLNet的有效性。With the increasing abundance of video resources,the research on cross-modal video moment retrieval has gradually emerged.Because video and text come from different feature Spaces,how to learn a common feature space to solve the semantic gap between data has become the critical issue.Existing methods use cross-modal encoders to align information features of different modalities,but multiple clips in the same video will interfere with each other,resulting in too rough video representation.Moreover,the computational complexity of the cross-modal encoder is too large,which leads to long retrieval time.To solve these two problems,a two-stage video moment retrieval network with multiple contrastive learning(MCLNet)was proposed.The model optimized feature alignment,reduced interference and solved the problem of too rough video representation through video-level contrastive learning,clip-level contrastive learning and intra-video modal contrastive learning.In addition,the model uses a two-stage method to perform video retrieval and moment location tasks in two stages,so that the video can be precoded and stored in the first stage,which solves the problem of long retrieval time of the model.Experimental results on two video moment retrieval datasets TVR,DiDeMo demonstrate the effectiveness of MCLNet.

关键词：跨模态视频片段检索公共特征空间特征对齐对比学习视频表征

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多重对比学习的两阶段视频片段检索

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多重对比学习的两阶段视频片段检索

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索