检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王玉萍[1] 曾毅[1] 李胜辉 张磊 WANG Yu-ping;ZENG Yi;LI Sheng-hui;ZHANG Lei(School of Information Engineering,Zhengzhou University of Science and Technology,Zhengzhou Henan 450064,China;College of Big Data,Henan Electromechanical Vocational College,Zhengzhou Henan 450064,China;School of Information Engineering,Zhengzhou University,Zhengzhou Henan 450001,China)
机构地区:[1]郑州科技学院信息工程学院,河南郑州450064 [2]河南机电职业学院大数据学院,河南郑州450064 [3]郑州大学信息工程学院,河南郑州450001
出 处:《图学学报》2023年第1期139-145,共7页Journal of Graphics
基 金:河南省科技厅科技攻关项目(222102210174)。
摘 要:三维人体姿态估计是人类行为理解的基础,但是预测出合理的三维人体姿态序列仍然是具有挑战性的问题。为了解决这个问题,提出一种基于Transformer的三维人体姿态估计方法,利用多层长短期记忆(LSTM)单元和多尺度Transformer结构增强人体姿态序列预测的准确性。首先,设计基于时间序列的生成器,通过ResNet预训练神经网络提取图像特征;其次,采用多层LSTM单元学习时间连续性的图像序列中人体姿态之间的关系,输出合理的SMPL人体参数模型序列;最后,构建基于多尺度Transformer的判别器,利用多尺度Transformer结构对多个分割粒度进行细节特征学习,尤其是Transformerblock对相对位置进行编码增强局部特征学习能力。实验结果表明,该方法相对于VIBE方法具有更好地预测精度,在3DPW数据集上比VIBE的平均(每)关节位置误差(MPJPE)低了7.5%;在MP-INF-3DHP数据集上比VIBE的MPJPE降低了1.8%。3D human pose estimation is the foundation of human behavior understanding, but predicting reasonable 3D human pose sequences remains a challenging problem. To solve this problem, a Transformer-based 3D human pose estimation method was proposed, utilizing a multi-layer long short-term memory(LSTM) unit and a multi-scale Transformer structure to enhance the accuracy of human pose sequence prediction. First, a generator based on time series was designed to extract image features through the ResNet pre-trained neural network. Secondly, multi-layer LSTM units were used to learn the relationship between human poses in temporally continuous image sequences,thereby outputting a reasonable skinned multi-person linear(SMPL) human parameter model sequence. Finally, a multi-scale Transformer-based discriminator was constructed, and the multi-scale Transformer structure was employed to learn detailed features for multiple segmentation granularities, especially the Transformer block encoding the relative position to enhance the local feature learning ability. Experimental results show that the proposed method could yield better prediction accuracy than the VIBE method, which is 7.5% lower than the average(per) joint position error(MPJPE) of VIBE on the 3DPW dataset, and 1.8% lower than VIBE’s MPJPE on the MP-INF-3DHP dataset.
关 键 词:多尺度Transformer结构 LSTM单元 时间序列 注意力机制 三维姿态估计
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.226.170