检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:庄添铭 秦臻[1] 耿技[1] 张瀚文 Zhuang Tianming;Qin Zhen;Geng Ji;Zhang Hanwen(Network&Data Security Key Laboratory of Sichuan Province,University of Electronic Science&Technology of China,Chengdu 610054,China)
机构地区:[1]电子科技大学网络与数据安全四川省重点实验室,成都610054
出 处:《计算机应用研究》2025年第4期1239-1247,共9页Application Research of Computers
基 金:国家自然科学基金资助项目(62372083,62072074,62070654,62027827,62020447);四川科技支撑计划资助项目(2024NSFTD0005,2023YFS0020,2023YFS0197,2023FG0148);CCF百度开放基金资助项目(202312)。
摘 要:近年来许多行为识别研究将人体骨架建模为拓扑图,并利用图卷积网络提取动作特征。然而,拓扑图在训练过程中固有的共享和静态特征限制了模型的性能。为解决该问题,提出基于自适应空间图卷积和时空Transformer的人体行为识别方法—ASGC-STT。首先,提出了一种非共享图拓扑的自适应空间图卷积网络,该图拓扑在不同网络层中是唯一的,可以提取更多样化的特征,同时使用多尺度时间卷积来捕获高级时域特征。其次,引入了一种时空Transformer模块,能够准确捕捉远距离的帧内和帧间任意关节之间的相关性,建模包含局部和全局关节关系的动作表示。最后,设计了一种多尺度残差聚合模块,通过分层残差结构设计来有效扩大感受野范围,捕获空间和时间域的多尺度依赖关系。ASGC-STT在大规模数据集NTU-RGB+D 60上的准确率为92.7%(X-Sub)和96.9%(X-View),在NTU-RGB+D 120上的准确率为88.2%(X-Sub)和89.5%(X-Set),在Kinetics Skeleton 400上的准确率为38.6%(top-1)和61.4%(top-5)。实验结果表明,ASGC-STT在人体行为识别任务中具有优越的性能和通用性。Many recent action recognition studies have modeled the human skeleton as a topology graph and used graph convolution network to extract action features.However,the inherent shared and static features of the topology graph during training limit the performance of the model.To address this issue,this paper proposed an adaptive spatial graph convolution and spatio-temporal Transformer(ASGC-STT)method for human action recognition.Firstly,it proposed an adaptive spatial graph convolution with non-shared graph topology,where the graph topology was unique in different network layers,enabling the extraction of more diverse features.Additionally,it used multi-scale temporal convolutions to capture high-level temporal features.Se-condly,it introduced a spatial-temporal Transformer module,which accurately captured the correlations between arbitrary joints within and between frames,modeling action representations that included local and global joint relationships.Finally,it designed a multi-scale residual aggregation module,which employed a hierarchical residual structure to effectively expand the receptive field,capturing multi-scale dependencies in both spatial and temporal domains.ASGC-STT achieved an accuracy of 92.7%(X-Sub)and 96.9%(X-View)on the large-scale dataset NTU-RGB+D 60,88.2%(X-Sub)and 89.5%(X-Set)on NTU-RGB+D 120,and 38.6%(top-1)and 61.4%(top-5)on Kinetics Skeleton 400.Experimental results demonstrate that ASGC-STT offers superior performance and generalization in human action re-cognition tasks.
分 类 号:TP37[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49