一种基于Transformer的三维人体姿态估计方法被引量：5

A Transformer-based 3D human pose estimation method

作　　者：王玉萍[1] 曾毅[1] 李胜辉张磊 WANG Yu-ping;ZENG Yi;LI Sheng-hui;ZHANG Lei(School of Information Engineering,Zhengzhou University of Science and Technology,Zhengzhou Henan 450064,China;College of Big Data,Henan Electromechanical Vocational College,Zhengzhou Henan 450064,China;School of Information Engineering,Zhengzhou University,Zhengzhou Henan 450001,China)

机构地区：[1]郑州科技学院信息工程学院,河南郑州450064 [2]河南机电职业学院大数据学院,河南郑州450064 [3]郑州大学信息工程学院,河南郑州450001

出　　处：《图学学报》2023年第1期139-145,共7页Journal of Graphics

基　　金：河南省科技厅科技攻关项目(222102210174)。

摘　　要：三维人体姿态估计是人类行为理解的基础,但是预测出合理的三维人体姿态序列仍然是具有挑战性的问题。为了解决这个问题,提出一种基于Transformer的三维人体姿态估计方法,利用多层长短期记忆(LSTM)单元和多尺度Transformer结构增强人体姿态序列预测的准确性。首先,设计基于时间序列的生成器,通过ResNet预训练神经网络提取图像特征;其次,采用多层LSTM单元学习时间连续性的图像序列中人体姿态之间的关系,输出合理的SMPL人体参数模型序列;最后,构建基于多尺度Transformer的判别器,利用多尺度Transformer结构对多个分割粒度进行细节特征学习,尤其是Transformerblock对相对位置进行编码增强局部特征学习能力。实验结果表明,该方法相对于VIBE方法具有更好地预测精度,在3DPW数据集上比VIBE的平均(每)关节位置误差(MPJPE)低了7.5%;在MP-INF-3DHP数据集上比VIBE的MPJPE降低了1.8%。3D human pose estimation is the foundation of human behavior understanding, but predicting reasonable 3D human pose sequences remains a challenging problem. To solve this problem, a Transformer-based 3D human pose estimation method was proposed, utilizing a multi-layer long short-term memory(LSTM) unit and a multi-scale Transformer structure to enhance the accuracy of human pose sequence prediction. First, a generator based on time series was designed to extract image features through the ResNet pre-trained neural network. Secondly, multi-layer LSTM units were used to learn the relationship between human poses in temporally continuous image sequences,thereby outputting a reasonable skinned multi-person linear(SMPL) human parameter model sequence. Finally, a multi-scale Transformer-based discriminator was constructed, and the multi-scale Transformer structure was employed to learn detailed features for multiple segmentation granularities, especially the Transformer block encoding the relative position to enhance the local feature learning ability. Experimental results show that the proposed method could yield better prediction accuracy than the VIBE method, which is 7.5% lower than the average(per) joint position error(MPJPE) of VIBE on the 3DPW dataset, and 1.8% lower than VIBE’s MPJPE on the MP-INF-3DHP dataset.

关键词：多尺度Transformer结构 LSTM单元时间序列注意力机制三维姿态估计

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于Transformer的三维人体姿态估计方法被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于Transformer的三维人体姿态估计方法 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于Transformer的三维人体姿态估计方法被引量：5