机构地区:[1]燕山大学信息科学与工程学院,河北秦皇岛066004 [2]燕山大学河北省信息传输与信号处理重点实验室,河北秦皇岛066004
出 处:《信号处理》2025年第4期683-693,共11页Journal of Signal Processing
基 金:国家自然科学基金青年科学基金(62001413);河北省自然科学基金(F2024203069)。
摘 要:基于骨骼数据的人体动作识别方法因其能够消除与动作无关的视觉信息来降低训练复杂性越来越受到人们关注,然而大规模骨骼动作数据收集和注释面临挑战,基于骨骼的单样本动作识别旨在仅用单个训练样本识别人体动作,可以使机器人对新颖动作类别积极反应改善人机交互。针对基于卷积神经网络编码器进行人类活动分类数据稀缺问题,考虑将单样本动作识别问题表述为骨骼序列紧凑表示和深度度量学习范式,基于自注意力Transformer机制和空间解耦约束重新审视骨骼动力学图像建模向新颖活动类别传输,提出多维感知-空间解耦单样本人体动作识别模型。首先,将3D骨骼序列坐标映射为紧凑图像表示;其次,基于骨干网络将输入投影到低维特征空间,提取初级动作特征;接着,设计融合多层感知机与Transformer的嵌入编码器,在嵌入空间中捕捉关节时间空间依赖关系,增强模型对时空信息感知能力,得到高层次多维嵌入特征;然后,基于最近邻搜索完成样本间相似性度量;最后,结合多相似性损失、三元组边界损失、交叉熵损失和空间解耦损失的混合深度度量学习优化模型。实验在公共大规模数据集NTU RGB+D 120上进行评估,提出方法较Skeleton-DML提高3.8%,在使用40个训练类别时较Skeleton-DML提高7.5%。研究表明,提出方法能够在数据稀缺情况下充分利用骨骼序列紧凑表示信息,提高单样本动作识别匹配精度。Skeleton-based human action recognition methods have gained significant attention due to their ability to simplify training by removing action-independent visual information.However,collecting and annotating extensive skeleton data remains a challenging task.Skeleton-based One-Shot Action Recognition(SOAR)aims to identify human actions using only a single training sample,which enhances robots capability to interact with humans by allowing them to respond effectively to new action categories.To tackle the issue of data scarcity in human activity classification using Convolutional Neural Network(CNN)encoders,we approached the SOAR problem with a focus on compact skeleton sequence representations and a Deep Metric Learning(DML)framework.We reexamined the modeling of skeleton dynamic images for transitioning to novel activity categories using a self-attention transformer mechanism and spatial disentanglement constraints.This led to the development of a deep metric learning skeleton-based algorithm for one-shot human action recognition that integrates multidimensional perception and spatial disentanglement.For data preprocessing,we transformed 3D skeleton sequence coordinates into compact image representations.Feature extraction involved projecting the input into a low-dimensional feature space with a backbone network to capture essential action features.An Integrated Multilayer Perceptron and Transformer Embedding Encoder(MLP-TransEmbedder)was designed to learn the spatial-temporal dependencies of joint movements,enhancing the model’s capacity to perceive spatiotemporal information and generate high-level multidimensional embedded features.To measure similarity between samples,we utilized a nearest neighbor search approach for distance metrics.The model optimization process combined Multi-Similarity Loss(MSL),Triplet Margin Loss(TML),Cross-Entropy Loss(CEL),and Spatial Disentanglement Loss(SDL)to encourage the learning of sparser and more interpretable feature representations.MSL specifically addressed the challeng
关 键 词:动作识别 单样本学习 度量学习 TRANSFORMER 空间解耦
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...