检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘锁兰[1,2] 田珍珍 王洪元[1] 林龙 王炎 LIU Suolan;TIAN Zhenzhen;WANG Hongyuan;LIN Long;WANG Yan(School of Computer Science and Artificial Intelligence,Aliyun School of Big Data,School of Software,Changzhou University,Changzhou Jiangsu 213164,China;Jiangsu Key Laboratory of Image and Video Understanding for Social Security(Nanjing University of Science and Technology),Nanjing Jiangsu 210094,China)
机构地区:[1]常州大学计算机与人工智能学院阿里云大数据学院软件学院,江苏常州213164 [2]江苏省社会安全图像与视频理解重点实验室(南京理工大学),南京210094
出 处:《计算机应用》2023年第10期3236-3243,共8页journal of Computer Applications
基 金:国家自然科学基金资助项目(61976028);江苏省社会安全图像与视频理解重点实验室开放课题(J2021⁃2)。
摘 要:针对人体行为识别任务中未能充分挖掘超距关节点之间潜在关联的问题,以及使用多模态数据带来的高昂训练成本的问题,提出一种单模态条件下的多尺度特征融合人体行为识别方法。首先,将人体的原始骨架图进行全局特征关联,并利用粗尺度的全局特征捕获远距离关节点间的联系;其次,对全局特征关联图进行局部划分以得到融合了全局特征的互补子图(CSGF),利用细尺度特征建立强关联,并形成多尺度特征的互补;最后,将CSGF输入时空图卷积模块中提取特征,并聚合提取后的结果以输出最终的分类结果。实验结果表明,在行为识别权威数据集NTU RGB+D60上,所提方法的准确率分别为89.0%(X-sub)和94.2%(X-view);在具有挑战性的大规模数据集NTU RGB+D120上,所提方法的准确率分别为83.3%(X-sub)和85.0%(X-setup),与单模态下的ST-TR(Spatial-Temporal TRansformer)相比,分别提升1.4和0.9个百分点,与轻量级SGN(Semantics-Guided Network)相比,分别提升4.1和3.5个百分点。可见,所提方法能够充分挖掘多尺度特征的协同互补性,并有效提高单模态条件下模型的识别准确率和训练效率。In order to solve the problem of insufficient mining of potential association between remote nodes in human action recognition tasks,and the problem of high training cost caused by using multi-modal data,a multi-scale feature fusion human action recognition method under the condition of single mode was proposed.Firstly,the global feature correlation of the original skeleton diagram of human body was carried out,and the coarse-scale global features were used to capture the connections between the remote nodes.Secondly,the global feature correlation graph was divided locally to obtain the Complementary Subgraphs with Global Features(CSGFs),the fine-scale features were used to establish the strong correlation,and the multi-scale feature complementarity was formed.Finally,the CSGFs were input into the spatialtemporal Graph Convolutional module for feature extraction,and the extracted results were aggregated to output the final classification results.Experimental results show that the accuracy of the proposed method on the authoritative action recognition dataset NTU RGB+D60 is 89.0%(X-sub)and 94.2%(X-view)respectively.On the challenging large-scale dataset NTU RGB+D120,the accuracy of the proposed method is 83.3%(X-sub)and 85.0%(X-setup)respectively,which is 1.4 and 0.9 percentage points higher than that of the ST-TR(Spatial-Temporal TRansformer)under single modal respectively,and 4.1 and 3.5 percentage points higher than that of the lightweight SGN(Semantics-Guided Network).It can be seen that the proposed method can fully exploit the synergistic complementarity of multi-scale features,and effectively improve the recognition accuracy and training efficiency of the model under the condition of single modal.
关 键 词:人体行为识别 骨架关节点 图卷积网络 单模态 多尺度 特征融合
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222