时空关联的Transformer骨架行为识别被引量：3

Space-Time-Correlated Transformer for Skeleton-Based Action Recognition

作　　者：卢先领杨嘉琦[1] LU Xianling;YANG Jiaqi(Key Laboratory for Advanced Process Control for Light Industry of the Education Ministry of China,Jiangnan University,Wuxi,Jiangsu 214122,China;School of Internet of Things,Jiangnan University,Wuxi,Jiangsu 214122,China)

机构地区：[1]江南大学“轻工过程先进控制”教育部重点实验室,江苏无锡214122 [2]江南大学物联网工程学院,江苏无锡214122

出　　处：《信号处理》2024年第4期766-775,共10页Journal of Signal Processing

基　　金：国家自然科学基金项目(61773181)。

摘　　要：目前主流的骨架行为识别方法采取关节流、骨骼流及其对应的运动流作为多流网络分别进行训练,造成训练成本高,另外,在特征提取过程中,忽略了对复杂时空依赖关系的建模,以及在时域上的信息交流采取大尺度卷积,导致聚合大量冗余信息。针对以上问题,提出一种时空关联的Transformer骨架行为识别方法。首先,构建运动融合模块,以关节流和骨骼流作为双流输入,在特征级别将各自的运动信息进行融合,减少单独训练运动流的成本;其次,提出移位Transformer模块,利用时间移位操作混合时空信息的特性,配合Transformer低成本地捕获短期时空依赖关系;然后,设计多尺度时间卷积进行时域长期信息交流;最后,融合双流得分获得最终分类预测。在大规模数据集NTU RGB+D以及NTU RGB+D 120上进行实验,结果表明,该模型在NTU RGB+D数据集的两种评价标准X-Sub和X-View上分别达到了91.5%和96.3%的识别准确率,在NTU RGB+D 120数据集两种评价标准X-Sub和X-Set上分别达到了87.2%和89.3%的识别准确率,本文所提方法的识别准确率相对主流骨架行为识别方法有明显提升,验证了模型的有效性和通用性。At present,the most common skeleton action recognition methods adopt a joint stream,bone stream,and corre-sponding motion stream as multi-stream networks for separate training operations,which results in high training costs.In ad-dition,the modeling of complex spatio-temporal dependencies is neglected in the feature extraction process,and large-scale convolution is adopted for the exchange of information in the temporal domain,leading to the aggregation of a large amount of redundant information.A space-time-correlated transformer skeleton action recognition method was investigated to address these problems.First,a motion fusion module was constructed to reduce the cost of training motion streams separately by us-ing joint and skeletal streams as inputs and fusing the respective motion information at the feature level.Second,a shift trans-former module was proposed,which used the characteristics of the temporal shift operation to mix spatio-temporal informa-tion with the transformer to capture the short-term spatio-temporal dependencies at a low cost.Then,a multiscale temporal convolution was designed for time-domain long-term information.Finally,the final classification prediction was obtained by fusing the two-stream scores.Experiments on the large-scale datasets NTU RGB+D and NTU RGB+D 120 showed that the model achieved recognition accuracies of 91.5%and 96.3%on the two evaluation standards X-Sub and X-View for the NTU RGB+D dataset,respectively;and 87.2%and 89.3%on the two evaluation standards X-Sub and X-Set for the NTU RGB+D 120 dataset,respectively.The recognition accuracy of the proposed method was significantly better than those of the most commonly used skeleton action recognition methods,which verified the effectiveness and generality of the model.

关键词：Transformer网络人体骨架多尺度卷积运动信息动作识别

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

时空关联的Transformer骨架行为识别被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

时空关联的Transformer骨架行为识别 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

时空关联的Transformer骨架行为识别被引量：3