基于时空图卷积与多模态融合的人体动作识别模型  

A Human Action Recognition Model Based on Spatio-temporal Graph Convolution and Multimodal Fusion

在线阅读下载全文

作  者:马磊 MA Lei(School of Information Engineering,Tongren Polytechnic College,Tongren Guizhou 554300,China)

机构地区:[1]铜仁职业技术学院信息工程学院,贵州铜仁554300

出  处:《四川职业技术学院学报》2025年第2期164-168,共5页Journal of Sichuan Vocational and Technical College

摘  要:智能安防、虚拟现实游戏、机器人交互是人体动作行为识别的重要应用场景.近年来,基于RGB-D成像视频、人体骨架时序信息和多传感器数据融合的多模态深度神经网络模型在人体行为动作识别任务上表现优异.基于骨架信息具有时序变化的特点,提出了一种基于时空图卷积与多模态融合的人体动作识别模型.模型首先借助先验知识将传感器数据融入骨骼节点的特征,并引入图卷积来处理骨架特征,使用3维卷积来处理RGB-D视频序列.多模态数据特征融合后,输出层使用softmax得到最终的人体动作识别结果.在UTD-MHAD多模态数据集上的实验结果表明,所提模型相较于基准方法在识别准确率上平均提升了4.43%.Intelligent security,virtual reality games,and robot interaction are important application scenarios for human motion and behavior recognition.In recent years,multimodal deep neural network models based on RGB-D imaging videos,human skeleton temporal information,and multi-sensor data fusion have shown excellent performance in human behavior and action recognition tasks.Based on the temporal variation of skeleton information,this paper proposes a human action recognition model based on spatiotemporal graph convolution and multimodal fusion.Firstly,by leveraging prior knowledge,sensor data is integrated into the features of skeletal nodes,and graph convolution is introduced to process skeletal features.3D convolution is used to process RGB-D video sequences.After multimodal data feature fusion,the output layer uses softmax to obtain the final human action recognition result.The experimental results on the UTD-MHAD multimodal dataset show that the proposed model has an average improvement of 4.43%in recognition accuracy compared to the benchmark method.

关 键 词:人体动作识别 多模态融合 图神经网络 传感器应用 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象