跨视图时序对比学习的自监督视频表征算法

Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation

作　　者：王露露徐增敏张雪莲[1,2] 蒙儒省卢涛 WANG Lulu;XU Zengmin;ZHANG Xuelian;MENG Ruxing;LU Tao(Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation,School of Mathematics and Computing Science,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China;Center for Applied Mathematics of Guangxi(GUET),Guilin,Guangxi 541004,China;Anview.ai,Guilin,Guangxi 541010,China;Hubei Key Laboratory of Intelligent Robot,School of Computer Science and Engineering,Wuhan Institute of Technology,Wuhan 430205,China)

机构地区：[1]桂林电子科技大学数学与计算科学学院广西高校数据分析与计算重点实验室,广西桂林541004 [2]广西应用数学中心(桂林电子科技大学),广西桂林541004 [3]桂林安维科技有限公司,广西桂林541010 [4]武汉工程大学计算机科学与工程学院智能机器人湖北省重点实验室,武汉430205

出　　处：《计算机工程与应用》2024年第18期158-166,共9页Computer Engineering and Applications

基　　金：广西自然科学基金(2024GXNSFAA010493);国家自然科学基金(61862015,62072350);广西科技基地和人才专项(AD23023002,AD21220114);广西重点研发计划项目(AB17195025)。

摘　　要：现有的自监督表征算法主要关注视频帧之间的短期运动特性,但是帧间动作序列的变化幅度较小,而且单视图数据因语义受限影响深度特征表达能力,视频动作中丰富的多视图信息未被充分利用。为此提出基于跨视图语义一致性的时序对比学习算法,自监督学习RGB帧和光流场两种数据中蕴含的动作时序变化特性,主要思路为:设计局部时序对比学习方法,采用不同正负样本划分策略,挖掘同一实例不重叠片段之间的时序相关性和判别可分性,增强细粒度特征表达能力;研究全局对比学习方法,通过跨视图语义协同训练来增加正样本,学习多实例不同视图的语义一致性,提高模型的泛化能力。通过两个下游任务对模型效果进行评估,在UCF101和HMDB51数据集的实验结果表明,所提方法在动作识别和视频检索任务上,较前沿主流方法平均提升了2~3.5个百分点。The existing self-supervised representation algorithms mainly focus on the short-term motion characteristics between video frames,but the variation range of the action sequence between frames is small,and the depth feature expression ability of single-view data is affected due to semantic limitations,so the rich multi-view information in video actions is not fully utilized.Therefore,a temporal contrast learning algorithm based on cross-view semantic consistency is proposed to self-supervised learn the action temporal variation characteristics embedded in both RGB frames and optical flow field data.The main ideas are as follows:to design a local temporal contrast learning method,adopt different posi-tive and negative sample division strategies to explore the temporal correlation and discriminative differentiability between non-overlapping segments of the same instance,and enhance the fine-grained feature expression capability;to study the global contrast learning method to increase the positive samples by cross-view semantic co-training,learn the semantic consistency of different views of multiple instances,and improve the generalization ability of the model.The model per-formance is evaluated through two downstream tasks,and the experimental results on UCF101 and HMDB51 datasets show that the proposed method improves on average 2~3.5 percentage points over cutting-edge mainstream methods on action recognition and video retrieval tasks.

关键词：自监督学习视频表征学习时序对比学习局部对比学习跨视图协同

分类号：TP391.41[自动化与计算机技术—计算机应用技术] TP183[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

跨视图时序对比学习的自监督视频表征算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

跨视图时序对比学习的自监督视频表征算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索