检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:卢先领 杨嘉琦[1] LU Xianling;YANG Jiaqi(Key Laboratory for Advanced Process Control for Light Industry of the Education Ministry of China,Jiangnan University,Wuxi,Jiangsu 214122,China;School of Internet of Things,Jiangnan University,Wuxi,Jiangsu 214122,China)
机构地区:[1]江南大学“轻工过程先进控制”教育部重点实验室,江苏无锡214122 [2]江南大学物联网工程学院,江苏无锡214122
出 处:《信号处理》2024年第4期766-775,共10页Journal of Signal Processing
基 金:国家自然科学基金项目(61773181)。
摘 要:目前主流的骨架行为识别方法采取关节流、骨骼流及其对应的运动流作为多流网络分别进行训练,造成训练成本高,另外,在特征提取过程中,忽略了对复杂时空依赖关系的建模,以及在时域上的信息交流采取大尺度卷积,导致聚合大量冗余信息。针对以上问题,提出一种时空关联的Transformer骨架行为识别方法。首先,构建运动融合模块,以关节流和骨骼流作为双流输入,在特征级别将各自的运动信息进行融合,减少单独训练运动流的成本;其次,提出移位Transformer模块,利用时间移位操作混合时空信息的特性,配合Transformer低成本地捕获短期时空依赖关系;然后,设计多尺度时间卷积进行时域长期信息交流;最后,融合双流得分获得最终分类预测。在大规模数据集NTU RGB+D以及NTU RGB+D 120上进行实验,结果表明,该模型在NTU RGB+D数据集的两种评价标准X-Sub和X-View上分别达到了91.5%和96.3%的识别准确率,在NTU RGB+D 120数据集两种评价标准X-Sub和X-Set上分别达到了87.2%和89.3%的识别准确率,本文所提方法的识别准确率相对主流骨架行为识别方法有明显提升,验证了模型的有效性和通用性。At present,the most common skeleton action recognition methods adopt a joint stream,bone stream,and corre-sponding motion stream as multi-stream networks for separate training operations,which results in high training costs.In ad-dition,the modeling of complex spatio-temporal dependencies is neglected in the feature extraction process,and large-scale convolution is adopted for the exchange of information in the temporal domain,leading to the aggregation of a large amount of redundant information.A space-time-correlated transformer skeleton action recognition method was investigated to address these problems.First,a motion fusion module was constructed to reduce the cost of training motion streams separately by us-ing joint and skeletal streams as inputs and fusing the respective motion information at the feature level.Second,a shift trans-former module was proposed,which used the characteristics of the temporal shift operation to mix spatio-temporal informa-tion with the transformer to capture the short-term spatio-temporal dependencies at a low cost.Then,a multiscale temporal convolution was designed for time-domain long-term information.Finally,the final classification prediction was obtained by fusing the two-stream scores.Experiments on the large-scale datasets NTU RGB+D and NTU RGB+D 120 showed that the model achieved recognition accuracies of 91.5%and 96.3%on the two evaluation standards X-Sub and X-View for the NTU RGB+D dataset,respectively;and 87.2%and 89.3%on the two evaluation standards X-Sub and X-Set for the NTU RGB+D 120 dataset,respectively.The recognition accuracy of the proposed method was significantly better than those of the most commonly used skeleton action recognition methods,which verified the effectiveness and generality of the model.
关 键 词:Transformer网络 人体骨架 多尺度卷积 运动信息 动作识别
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7