基于单模态的多尺度特征融合人体行为识别方法被引量：2

Human action recognition method based on multi-scale feature fusion of single mode

作　　者：刘锁兰[1,2] 田珍珍王洪元[1] 林龙王炎 LIU Suolan;TIAN Zhenzhen;WANG Hongyuan;LIN Long;WANG Yan(School of Computer Science and Artificial Intelligence,Aliyun School of Big Data,School of Software,Changzhou University,Changzhou Jiangsu 213164,China;Jiangsu Key Laboratory of Image and Video Understanding for Social Security(Nanjing University of Science and Technology),Nanjing Jiangsu 210094,China)

机构地区：[1]常州大学计算机与人工智能学院阿里云大数据学院软件学院,江苏常州213164 [2]江苏省社会安全图像与视频理解重点实验室(南京理工大学),南京210094

出　　处：《计算机应用》2023年第10期3236-3243,共8页journal of Computer Applications

基　　金：国家自然科学基金资助项目(61976028);江苏省社会安全图像与视频理解重点实验室开放课题(J2021⁃2)。

摘　　要：针对人体行为识别任务中未能充分挖掘超距关节点之间潜在关联的问题,以及使用多模态数据带来的高昂训练成本的问题,提出一种单模态条件下的多尺度特征融合人体行为识别方法。首先,将人体的原始骨架图进行全局特征关联,并利用粗尺度的全局特征捕获远距离关节点间的联系;其次,对全局特征关联图进行局部划分以得到融合了全局特征的互补子图(CSGF),利用细尺度特征建立强关联,并形成多尺度特征的互补;最后,将CSGF输入时空图卷积模块中提取特征,并聚合提取后的结果以输出最终的分类结果。实验结果表明,在行为识别权威数据集NTU RGB+D60上,所提方法的准确率分别为89.0%(X-sub)和94.2%(X-view);在具有挑战性的大规模数据集NTU RGB+D120上,所提方法的准确率分别为83.3%(X-sub)和85.0%(X-setup),与单模态下的ST-TR(Spatial-Temporal TRansformer)相比,分别提升1.4和0.9个百分点,与轻量级SGN(Semantics-Guided Network)相比,分别提升4.1和3.5个百分点。可见,所提方法能够充分挖掘多尺度特征的协同互补性,并有效提高单模态条件下模型的识别准确率和训练效率。In order to solve the problem of insufficient mining of potential association between remote nodes in human action recognition tasks,and the problem of high training cost caused by using multi-modal data,a multi-scale feature fusion human action recognition method under the condition of single mode was proposed.Firstly,the global feature correlation of the original skeleton diagram of human body was carried out,and the coarse-scale global features were used to capture the connections between the remote nodes.Secondly,the global feature correlation graph was divided locally to obtain the Complementary Subgraphs with Global Features(CSGFs),the fine-scale features were used to establish the strong correlation,and the multi-scale feature complementarity was formed.Finally,the CSGFs were input into the spatialtemporal Graph Convolutional module for feature extraction,and the extracted results were aggregated to output the final classification results.Experimental results show that the accuracy of the proposed method on the authoritative action recognition dataset NTU RGB+D60 is 89.0%(X-sub)and 94.2%(X-view)respectively.On the challenging large-scale dataset NTU RGB+D120,the accuracy of the proposed method is 83.3%(X-sub)and 85.0%(X-setup)respectively,which is 1.4 and 0.9 percentage points higher than that of the ST-TR(Spatial-Temporal TRansformer)under single modal respectively,and 4.1 and 3.5 percentage points higher than that of the lightweight SGN(Semantics-Guided Network).It can be seen that the proposed method can fully exploit the synergistic complementarity of multi-scale features,and effectively improve the recognition accuracy and training efficiency of the model under the condition of single modal.

关键词：人体行为识别骨架关节点图卷积网络单模态多尺度特征融合

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于单模态的多尺度特征融合人体行为识别方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于单模态的多尺度特征融合人体行为识别方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于单模态的多尺度特征融合人体行为识别方法被引量：2