基于局部特征增强的单目3D人体姿态估计算法  

Local feature enhancing for 3D human pose estimation

在线阅读下载全文

作  者:闫鑫 王创业 高浩[1] YAN Xin;WANG Chuangye;GAO Hao(College of Automation&College of Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210046,China;Stage Grid Bengbu Power Supply Company,Bengbu 233090,China)

机构地区:[1]南京邮电大学自动化学院,江苏南京210046 [2]国网蚌埠供电公司,安徽蚌埠233090

出  处:《微电子学与计算机》2025年第4期106-113,共8页Microelectronics & Computer

摘  要:在计算机视觉和机器学习领域的快速发展中,3D人体姿态估计已成为一项备受关注的研究方向。早期的3D人体姿态估计方法多集中在图像领域,然而这类方法需要更多的计算资源且结果并不理想。为了克服这些问题,2D-to-3D方法应运而生。目前效果最佳的2D-to-3D方法多基于Transformer,然而这类方法着重于对人体骨架的全局提取,忽略了骨架的局部差异性,导致对局部信息学习不够充分。本文提出一种基于Transformer框架的三维人体姿态估计算法,该算法在全局算法的基础上添加一个局部分支网络。在局部分支中,首先通过非均匀图卷积网络提取二维人体骨架中的空间语义特征,使网络更好地学习人体的拓扑结构关系。其次,通过分层局部时间网络从人体关节、部位及姿势这3个不同层级学习帧与帧之间的细微差异。在全局算法中,输入数据经过空间和时间Transformer分别提取所有关键点和所有帧的分布关系。该网络在低层部分由局部算法与全局算法并联提取骨架特征,高层部分则由全局算法级联组成。本文在Human3.6M和MPI-INF-3DHP两个公共数据集上使用MPJPE(Mean Per Joint Position Error)评价指标对该方法进行评估,分别取得20.8 mm及22.3 mm的结果。结果表明,本文算法已达到相对较高的性能水准。In the rapidly evolving fields of computer vision and machine learning,3D human pose estimation has emerged as a highly researched area.Early approaches to 3D human pose estimation primarily focused on image-based methods.However,these methods required substantial computational resources and yielded less-than-ideal results.To address these challenges,2D-to-3D methods have been developed.Currently,state-of-the-art 2D-to-3D methods are predominantly based on the Transformer architecture.However,these methods emphasize the global extraction of human skeletal structures,overlooking the local variations in the skeleton and resulting in insufficient learning of local information.In this paper,a three-dimensional human pose estimation algorithm was proposed based on the Transformer framework.Our algorithm enhances the global approach by introducing a local branch network.Within the local branch,we first employ a nonuniform graph convolutional network to extract spatial semantic features from the two-dimensional human skeleton.This enhances the network's ability to learn the topological structure of the human body.Subsequently,a hierarchical local-temporal network is utilized to learn subtle differences between frames at three different levels: joints, body parts, and poses. In the global approach, input data undergo spatial and temporal transformations by Transformers to extract distribution relationships among all keypoints and frames. Our network consists of a parallel extraction of skeletal features in the lower layers through the local and global algorithms, while the higher layers are composed of a cascading global algorithm. This method on the Human3.6M and MPI-INF-3DHP public datasets was evaluated using the Mean Per Joint Position Error (MPJPE) metric, achieving results of 20.8 mm and 22.3 mm, respectively. These results indicate that our algorithm attains a relatively high level of performance.

关 键 词:3D人体姿态估计 局部信息 TRANSFORMER 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象