多级跨模态对齐的文本检索视频方法研究  

A Multilevel Cross-modal Alignment for Textbase-Video Retrieval

作  者:习怡萌 刘立波 邓箴 刘倩 XI Yimeng;LIU Libo;DENG Zhen;LIU Qian(College of Information Engineering,Ningxia University,Yinchuan,Ningxia 750021,China;Ningxia Key Laboratory of Artificial Intelligence and Information Security for Channeling Computing Resources from the East to the West,Yinchuan,Ningxia 750021,China)

机构地区:[1]宁夏大学信息工程学院,宁夏银川750021 [2]宁夏“东数西算”人工智能与信息安全重点实验室,宁夏银川750021

出  处:《中文信息学报》2025年第2期111-122,共12页Journal of Chinese Information Processing

基  金:国家自然科学基金(62262053);宁夏科技创新领军人才项目(2022GKLRLX03);宁夏高等学校科学研究项目(NYG2024023);宁夏大学研究生创新项目(CXXM202406)。

摘  要:现有文本检索视频方法在进行跨模态对齐时,未充分考虑文本细节和复杂视觉语义间的信息交互,使检索性能受到影响。为解决此问题,该文提出一种多级跨模态对齐的文本检索视频方法。首先,将查询文本按词性进行分解并编码,同时对视频帧进行编码和聚类操作;然后,对查询文本和视频的全局编码进行对齐,获取二者间的全局语义关系;接着,对文本动词编码与视频子动作编码进行动作对齐,以实现动作关联;最后,将名词编码与经动作对齐筛选的关键帧进行实体对齐,进一步消弱视频中弱相关或不相关帧,提高文本与视频之间的相关性。实验证明,该方法在MSR-VTT、DiDeMo和LSMDC公共数据集上的R@1指标分别提升了2.3%、1.5%和0.9%,优于现有文本检索视频方法。To capture the interaction of information between textual details and complex visual semantics during cross-modal alignment,this paper proposes a multi-level cross-modal alignment approach for text-based video retrieval.Firstly,the query text is decomposed and encoded based on its part-of-speech,while video frames are encoded and clustered.Then the alignment is performed between the global encodings of the query text and the video,capturing their global semantic relationship.Meanwhile,the action alignment is carried out between the textual verb encoding and video sub-action encoding to establish action correlations.Finally,entity alignment is performed between noun encoding and key frames filtered through action alignment,further reducing weakly or unrelated frames in the video.Experimental results demonstrate that this method outperforms existing methods by improving the R@1 metrics by 2.3%,1.5%,and 0.9%on the MSR-VTT,DiDeMo,and LSMDC public datasets,respectively.

关 键 词:文本检索视频 文本分解 视频关键帧提取 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象