多时间尺度一致性的弱监督时序动作定位被引量：3

Multi-Temporal Scales Consensus forWeakly Supervised Temporal Action Localization

作　　者：郭文斌杨兴明[1] 蒋哲远[1] 吴克伟[1,2] 谢昭[1] GUO Wenbin;YANG Xingming;JIANG Zheyuan;WU Kewei;XIE Zhao(School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230009,China;Anhui Province Key Laboratory of Industry Safety and Emergency Technology,Hefei University of Technology,Hefei 230009,China)

机构地区：[1]合肥工业大学计算机与信息学院,合肥230009 [2]合肥工业大学工业安全与应急技术安徽省重点实验室,合肥230009

出　　处：《计算机工程与应用》2023年第10期151-161,共11页Computer Engineering and Applications

基　　金：安徽省重点研究与开发计划(202004d07020004);安徽省自然科学基金(2108085MF203);中央高校基本科研业务费专项资金(PA2021GDSK0072,JZ2021HGQA0219)。

摘　　要：由于弱监督时序动作定位模型使用视频级的标签作为监督信号,模型在识别出动作实例中最具区分性的视频片段时,也会将和视频级标签有关的背景片段误认为是动作,难以产生完整的动作提议。为了进一步检测动作片段,通过分析动作片段在多时间尺度上标记的一致性,提出了一种多时间尺度一致性的弱监督时序动作定位方法。对输入的视频帧提取RGB和光流的特征,设计一种多时间尺度的模块,使用不同尺寸的卷积核建模视频的时序关系。通过估计多时间尺度特征的时间类激活图,并对多分支的时间类激活图进行融合,获得多时间尺度一致性的动作预测标签。为了进一步优化模型预测的动作标签,采用迭代优化策略,在每次迭代中更新预测标签,并为模型训练提供有效的帧级监督信号。在THUMOS14和ActivityNet1.3数据集上进行实验验证,实验结果表明,方法性能优于现有弱监督时序动作定位方法。Weakly supervised temporal action localization model identifies the most distinctive video segments in the action instances,and also mistakes the background segment related to the video-level labels as an action,it is difficult to get a complete action proposal because of using the video-level label as the supervision signal.In order to further detect action segments,a multi-temporal scales consensus for weakly supervised temporal action localization method is proposed by analyzing the consistency of action segments on multi-temporal scales.Firstly,the features of RGB and optical flow are extracted from the input video frames,and a multi-temporal scale module is designed to model the video temporal relationship using convolution kernels of different sizes.Secondly,the predicted action labels with multi-temporal scales consensus are obtained by estimating the multi-time scale feature time class activation map and fusing the multibranch time class activation map.Finally,in order to further optimize the action labels predicted by the model,the iterative optimization strategy is adopted to update the prediction labels in each iteration,and provide effective frame-level supervision signals for model training.Experiments are conducted on THUMOS14 and ActivityNet1.3 datasets.Experimental results show that the proposed network is superior to the state-of-the-art methods.

关键词：弱监督时序动作定位多时间尺度一致性

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多时间尺度一致性的弱监督时序动作定位被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多时间尺度一致性的弱监督时序动作定位 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

多时间尺度一致性的弱监督时序动作定位被引量：3