检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张宇[1] 刘骊[1,2] 付晓东 刘利军[1,2] 彭玮 Zhang Yu;Liu Li;Fu Xiaodong;Liu Lijun;Peng Wei(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;Yunnan Key Laboratory of Computer Technologies Application,Kunming University of Science and Technology,Kunming 650500)
机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学云南省计算机技术应用重点实验室,昆明650500
出 处:《计算机辅助设计与图形学学报》2025年第1期75-88,共14页Journal of Computer-Aided Design & Computer Graphics
基 金:国家自然科学基金(62262036,61962030);云南省中青年学术和技术带头人后备人才培养计划(202005AC160036).
摘 要:针对视频输入的单人三维人体姿态估计中表征不精确、融合不充分、结果不平滑的问题,提出三维人体姿态估计的多尺度时空特征融合方法.首先在空域定义关节点、肢体和上/下身人体标记并通过位置嵌入表示人体的空间多尺度特征;然后结合自注意力机制和多层感知机构建空间多尺度特征融合模块,融合关节点、肢体和上/下身三个空间多尺度特征,得到初步姿态特征序列;最后建立时序多尺度编码进行时序特征融合获得最终姿态特征序列,并通过时序解码,优化生成细化的三维人体姿态.在Human3.6M数据集上的实验结果表明,所提方法的平均每关节位置P-MPJPE和速度误差MPJVE分别为33.6和2.4,较对比方法降低了2.3%和4.0%,能够降低计算复杂度,提高三维人体姿态估计精度,生成准确、平滑的三维人体姿态估计结果.此外,在HumanEva-I数据集的测试结果表明,所提方法也具有一定的泛化性.To address the problem of inaccurate representation,inadequate fusion and unsmooth results in video-based single person three-dimensional human pose estimation,a multi-scale spatial-temporal feature fusion method is proposed.Firstly,the joint,limb and upper/lower body tokens were defined in spatial domain to represent the spatial multi-scale features of human body using positional embeddings.Secondly,the spatial multi-scale feature fusion module was constructed based on self-attention mechanism and multilayer perceptron to fuse joint,limb and upper/lower body features,obtaining initial pose feature sequence.Lastly,the temporal multi-scale encoding was established for temporal feature fusion to acquire final pose feature sequence,and optimize the generation of refined three-dimensional human pose through temporal decoding.Experimental results on Human3.6M dataset show that the mean per joint position error and joint velocity errors are 33.6 and 2.4 respectively,which reduce by 2.3% and 4.0%.The proposed method can improve three-dimensional human pose estimation accuracy and generate precise and smooth results while reducing computational cost.Furthermore,experimental results on HumanEva-I dataset show that the proposed method also has a certain degree of generalization ability.
关 键 词:三维人体姿态估计 多尺度特征 自注意力机制 时空特征融合 时序编码
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200