基于多层卷积神经网络特征和双向长短时记忆单元的行为识别(英文)  被引量:12

Action recognition with hierarchical convolutional neural networks features and bi-directional long short-term memory model

在线阅读下载全文

作  者:葛瑞 王朝晖[1] 徐鑫 季怡[1] 刘纯平[1,2,3] 龚声蓉 GE Rui;WANG Zhao-hui;XU Xin;JI Yi;LIU Chun-ping;GONG Sheng-rong(School of computer science and technolgoy, Soochow University, Suzhou Jiangsu 215000, China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun Jilin 130012, China;Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing Jiangsu 210046, China;School of Computer Science and Engineering, Changshu Institute of Technology, Changshu Jiangsu 215500, China)

机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215000 [2]吉林大学符号计算与知识工程教育部重点实验室,吉林长春130012 [3]软件新技术与产业化协同创新中心,江苏南京210046 [4]常熟理工学院计算机科学与工程学院,江苏常熟215500

出  处:《控制理论与应用》2017年第6期790-796,共7页Control Theory & Applications

基  金:Supported by National Natural Science Foundation of China(61170124,61272258,61301299,61272005,61572085);Provincial Natural Science Foundation of Jiangsu(BK20151254,BK20151260);Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University(93K172016K08);a Prospective Joint Research Projects from Joint Innovation and Research Foundation of Jiangsu Province(BY2014-05914);Collaborative Innovation Center of Novel Software Technology and Industrialization

摘  要:鲁棒的视频行为识别由于其复杂性成为了一项极具挑战的任务.如何有效提取鲁棒的时空特征成为解决问题的关键.在本文中,提出使用双向长短时记忆单元(Bi-LSTM)作为主要框架去捕获视频序列的双向时空特征.首先,为了增强特征表达,使用多层的卷积神经网络特征代替传统的手工特征.多层卷积特征融合了低层形状信息和高层语义信息,能够捕获丰富的空间信息.然后,将提取到的卷积特征输入Bi-LSTM,Bi-LSTM包含两个不同方向的LSTM层.前向层从前向后捕获视频演变,后向层反方向建模视频演变.最后两个方向的演变表达融合到Softmax中,得到最后的分类结果.在UCF101和HMDB51数据集上的实验结果显示本文的方法在行为识别上可以取得较好的性能.Robust action recognition in videos is a challenging task due to its complexity.To solve it,how to effectively capture the robust spatio-temporal features becomes very important.In this paper,we propose to exploit bi-directional long short-term memory(Bi--LSTM)model as main framework to capture bi-directional spatio-temporal features.First,in order to boost our feature representations,the traditional hand-crafted descriptors are replaced by the extracted hierarchical convolutional neural network features.The multiple convolutional layer features fuse the information of low level basic shapes and high level semantic contents to get powerful spatial features.Then,the extracted convolutional features are fed into Bi--LSTM which has two different directional LSTM layers.The forward layer captures the evolution from front to back over video time and the backward layer models the opposite directional evolution.The two directional representations of evolution are then fused into Softmax to get final classification result.The experiments on UCF101and HMDB51datasets show that our method can achieve comparable performance with the state of the art methods for action recognition.

关 键 词:行为识别 卷积神经网络 递归神经网络 双向递归神经网络 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程] TP391.41[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象