结合分水岭和回归网络的视频时序动作选举算法  被引量:1

Algorithm for Video Temporal Action Proposal Combining Watershed and Regression Networks

在线阅读下载全文

作  者:黄韵文 王斐 李景宏[1] 王国锐 Huang Yunwen;Wang Fei;Li Jinghong;Wang Guorui(College of Information Science and Engineering,Northeastern University,Shenyang,Liaoning 110819,China;Faculty of Robot Science and Engineering,Northeastern University,Shenyang,Liaoning 110169,China)

机构地区:[1]东北大学信息科学与工程学院,辽宁沈阳110819 [2]东北大学机器人科学与工程学院,辽宁沈阳110169

出  处:《中国激光》2019年第11期270-278,共9页Chinese Journal of Lasers

摘  要:针对时序动作选举任务,设计一种两段式动作候选区域选举网络。第一段将改进的分水岭算法应用于一维时序信号,通过浸水聚类产生多种不同长度的候选区域,实现动作时序边界的粗定位,进而提出一种时序金字塔结构化方法,引入动作片段的上下文信息模块,对候选区域的主体信息和上下文信息进行结构化建模,生成一个增强的全局特征。第二段利用时序坐标回归算法定位动作边界,同时加入动作/背景分类器过滤背景候选区域,得到更加精确的时序边界。整个网络以三维卷积神经网络(C3D)提取的单元级特征进行训练,挖掘了视频时域和空域的丰富语义,在提升算法精度的同时大大提升了训练效率。在两大基准数据集Thumos 14和ActivityNet上进行测试,结果表明,与已有方法相比,两段式视频时序动作选举算法达到了最优平均召回率,可有效提高动作定位的精度。A two-stage action-candidate regional proposal network is designed herein for a temporal action detection task. The first stage applies a modified watershed algorithm to an one-dimensional temporal signal to form candidate regions with different lengths by immersion clustering, which obtains a rough localization of action temporal boundary. Then, a temporal pyramid structural method is introduced to model the structure of action instances and their contextual information, generating an enhanced global feature. The second stage performs a temporal-coordinate regression algorithm to local the action boundary, and simultaneously a classifier for the action and boundary is added to filter the candidate regions of background for obtaining a more accurate temporal boundary. Furthermore, an unit-level feature extracted by a three-dimensional convolution neural network(C3 D) is used to train the entire two-stage proposal algorithm, which contains both spatial and temporal information and considerably improves training efficiency while improving the accuracy of the algorithm. Experiments on two large-scale benchmark datasets, Thumos 14 and ActivityNet, show that the proposed approach achieves the optimal average recall rate over other state-of-the-art methods, indicating that this method can efficiently improve the precision of an action localization task.

关 键 词:机器视觉 视频时序检测 动作定位 金字塔池化 时序上下文 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象