基于交叉注意力与语义感知的视频内容描述

Video Content Description Based on Cross Attention and Semantic Perception

作　　者：张晶周凯吴文涛 ZHANG Jing;ZHOU Kai;WU Wen-tao(Department of Intelligent Control,Shanxi Railway Vocational and Technical College,Taiyuan 030013,China;College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China;China Academy of Space Technology(Xi’an),Xi’an 710100,China)

机构地区：[1]山西铁道职业技术学院智能控制系,太原030013 [2]太原理工大学信息与计算机学院,太原030024 [3]中国空间技术研究院西安分院,西安710100

出　　处：《印刷与数字媒体技术研究》2025年第2期213-222,共10页Printing and Digital Media Technology Study

基　　金：国家自然科学基金项目(No.61802124)。

摘　　要：针对现有视频内容描述方法忽略对视频中活动信息的关注、对关键信息挖掘不够充分的问题,本研究提出了一种基于交叉注意力和语义感知的视频内容描述方法。首先,以视频活动为边界,利用聚类算法将视频切分为多个不同时长的视频片段,并提取各片段的视觉特征;然后,使用设计的语义感知模块为视频设置语义标签;最后,构建交叉模态注意力模块,加强视觉特征中关键信息的特征表示,生成描述语句,并在公开数据集上测试验证。结果表明,本研究模型在BLEU、METEOR和ROUGE-L指标上有显著的提升,相较于当前的主流视频内容描述模型,在单词匹配、语义匹配、可读性等多方面有明显的改善。Aiming at the problem including the existing video content description methods ignoring the attention to the activity information in the video and insufficient mining of key information,a video content description method based on cross-attention and semantic perception was proposed in this study.Firstly,the video activity was taken as a boundary and the video was sliced into multiple video segments of different durations utilizing clustering methods and the visual features of each segment were extracted.Then,a semantic perception module was designed to set semantic labels for the video.Finally,a cross-modal attention module was constructed to enhance the feature representation of the key information in the visual features and generate the description text.Through testing and validation on public datasets,the results showed that the proposed model has significant improvement in BLEU,METEOR,and ROUGE-L metrics.Compared to the current mainstream video content description models,it has significant improvements in word matching,semantic matching,and readability.

关键词：视频内容描述视频理解注意力机制多模态语义检测

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于交叉注意力与语义感知的视频内容描述

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于交叉注意力与语义感知的视频内容描述

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索