ASGC-STT:基于自适应空间图卷积和时空Transformer的人体行为识别  

ASGC-STT:adaptive spatial graph convolution and spatio-temporal Transformer for action recognition

在线阅读下载全文

作  者:庄添铭 秦臻[1] 耿技[1] 张瀚文 Zhuang Tianming;Qin Zhen;Geng Ji;Zhang Hanwen(Network&Data Security Key Laboratory of Sichuan Province,University of Electronic Science&Technology of China,Chengdu 610054,China)

机构地区:[1]电子科技大学网络与数据安全四川省重点实验室,成都610054

出  处:《计算机应用研究》2025年第4期1239-1247,共9页Application Research of Computers

基  金:国家自然科学基金资助项目(62372083,62072074,62070654,62027827,62020447);四川科技支撑计划资助项目(2024NSFTD0005,2023YFS0020,2023YFS0197,2023FG0148);CCF百度开放基金资助项目(202312)。

摘  要:近年来许多行为识别研究将人体骨架建模为拓扑图,并利用图卷积网络提取动作特征。然而,拓扑图在训练过程中固有的共享和静态特征限制了模型的性能。为解决该问题,提出基于自适应空间图卷积和时空Transformer的人体行为识别方法—ASGC-STT。首先,提出了一种非共享图拓扑的自适应空间图卷积网络,该图拓扑在不同网络层中是唯一的,可以提取更多样化的特征,同时使用多尺度时间卷积来捕获高级时域特征。其次,引入了一种时空Transformer模块,能够准确捕捉远距离的帧内和帧间任意关节之间的相关性,建模包含局部和全局关节关系的动作表示。最后,设计了一种多尺度残差聚合模块,通过分层残差结构设计来有效扩大感受野范围,捕获空间和时间域的多尺度依赖关系。ASGC-STT在大规模数据集NTU-RGB+D 60上的准确率为92.7%(X-Sub)和96.9%(X-View),在NTU-RGB+D 120上的准确率为88.2%(X-Sub)和89.5%(X-Set),在Kinetics Skeleton 400上的准确率为38.6%(top-1)和61.4%(top-5)。实验结果表明,ASGC-STT在人体行为识别任务中具有优越的性能和通用性。Many recent action recognition studies have modeled the human skeleton as a topology graph and used graph convolution network to extract action features.However,the inherent shared and static features of the topology graph during training limit the performance of the model.To address this issue,this paper proposed an adaptive spatial graph convolution and spatio-temporal Transformer(ASGC-STT)method for human action recognition.Firstly,it proposed an adaptive spatial graph convolution with non-shared graph topology,where the graph topology was unique in different network layers,enabling the extraction of more diverse features.Additionally,it used multi-scale temporal convolutions to capture high-level temporal features.Se-condly,it introduced a spatial-temporal Transformer module,which accurately captured the correlations between arbitrary joints within and between frames,modeling action representations that included local and global joint relationships.Finally,it designed a multi-scale residual aggregation module,which employed a hierarchical residual structure to effectively expand the receptive field,capturing multi-scale dependencies in both spatial and temporal domains.ASGC-STT achieved an accuracy of 92.7%(X-Sub)and 96.9%(X-View)on the large-scale dataset NTU-RGB+D 60,88.2%(X-Sub)and 89.5%(X-Set)on NTU-RGB+D 120,and 38.6%(top-1)and 61.4%(top-5)on Kinetics Skeleton 400.Experimental results demonstrate that ASGC-STT offers superior performance and generalization in human action re-cognition tasks.

关 键 词:人体行为识别 时空特征 图卷积网络 多尺度建模 

分 类 号:TP37[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象