三维人体姿态估计中的多尺度时空特征融合

Multi-Scale Spatial-Temporal Feature Fusion for 3D Human Pose Estimation

作　　者：张宇[1] 刘骊[1,2] 付晓东刘利军[1,2] 彭玮 Zhang Yu;Liu Li;Fu Xiaodong;Liu Lijun;Peng Wei(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;Yunnan Key Laboratory of Computer Technologies Application,Kunming University of Science and Technology,Kunming 650500)

机构地区：[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学云南省计算机技术应用重点实验室,昆明650500

出　　处：《计算机辅助设计与图形学学报》2025年第1期75-88,共14页Journal of Computer-Aided Design & Computer Graphics

基　　金：国家自然科学基金(62262036,61962030);云南省中青年学术和技术带头人后备人才培养计划(202005AC160036).

摘　　要：针对视频输入的单人三维人体姿态估计中表征不精确、融合不充分、结果不平滑的问题,提出三维人体姿态估计的多尺度时空特征融合方法.首先在空域定义关节点、肢体和上/下身人体标记并通过位置嵌入表示人体的空间多尺度特征;然后结合自注意力机制和多层感知机构建空间多尺度特征融合模块,融合关节点、肢体和上/下身三个空间多尺度特征,得到初步姿态特征序列;最后建立时序多尺度编码进行时序特征融合获得最终姿态特征序列,并通过时序解码,优化生成细化的三维人体姿态.在Human3.6M数据集上的实验结果表明,所提方法的平均每关节位置P-MPJPE和速度误差MPJVE分别为33.6和2.4,较对比方法降低了2.3%和4.0%,能够降低计算复杂度,提高三维人体姿态估计精度,生成准确、平滑的三维人体姿态估计结果.此外,在HumanEva-I数据集的测试结果表明,所提方法也具有一定的泛化性.To address the problem of inaccurate representation,inadequate fusion and unsmooth results in video-based single person three-dimensional human pose estimation,a multi-scale spatial-temporal feature fusion method is proposed.Firstly,the joint,limb and upper/lower body tokens were defined in spatial domain to represent the spatial multi-scale features of human body using positional embeddings.Secondly,the spatial multi-scale feature fusion module was constructed based on self-attention mechanism and multilayer perceptron to fuse joint,limb and upper/lower body features,obtaining initial pose feature sequence.Lastly,the temporal multi-scale encoding was established for temporal feature fusion to acquire final pose feature sequence,and optimize the generation of refined three-dimensional human pose through temporal decoding.Experimental results on Human3.6M dataset show that the mean per joint position error and joint velocity errors are 33.6 and 2.4 respectively,which reduce by 2.3% and 4.0%.The proposed method can improve three-dimensional human pose estimation accuracy and generate precise and smooth results while reducing computational cost.Furthermore,experimental results on HumanEva-I dataset show that the proposed method also has a certain degree of generalization ability.

关键词：三维人体姿态估计多尺度特征自注意力机制时空特征融合时序编码

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

三维人体姿态估计中的多尺度时空特征融合

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

三维人体姿态估计中的多尺度时空特征融合

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索