机构地区:[1]大连大学先进设计与智能计算教育部重点实验室,大连116622 [2]湖南工学院安全与管理工程学院,衡阳421002 [3]大连理工大学计算机科学与技术学院,大连116024
出 处:《中国图象图形学报》2025年第1期254-267,共14页Journal of Image and Graphics
基 金:111计划项目(D23006);辽宁省高校创新团队支持计划项目(LT2020015);大连市重点领域创新团队支持计划项目(2021RT06);大连市重大基础研究项目(2023JJ11CG002);辽宁省教育厅科研计划项目(LJKMZ20221839);大连大学创新团队支持计划项目(XLJ202010);大连大学学科交叉项目(DLUXK-2024-YB007);湖南省自然科学基金项目(2022JJ30017)。
摘 要:目的多视图三维人体姿态估计能够从多方位的二维图像中估计出各个关节点的深度信息,克服单目三维人体姿态估计中因遮挡和深度模糊导致的不适定性问题,但如果系统性能被二维姿态估计结果的有效性所约束,则难以实现最终三维估计精度的进一步提升。为此,提出了一种联合多视图可控融合和关节相关性的三维人体姿态估计算法CFJCNet(controlled fusion and joint correlation network),包括多视图融合优化模块、二维姿态细化模块和结构化三角剖分模块3部分。方法首先,基于极线几何框架的多视图可控融合优化模块有选择地利用极线几何原理提高二维热图的估计质量,并减少噪声引入;然后,基于图卷积与注意力机制联合学习的二维姿态细化方法以单视图中关节点之间的联系性为约束,更好地学习人体的整体和局部信息,优化二维姿态估计;最后,引入结构化三角剖分以获取人体骨长先验知识,嵌入三维重建过程,改进三维人体姿态的估计性能。结果该算法在两个公共数据集Human3.6M、Total Capture和一个合成数据集Occlusion-Person上进行了评估实验,平均关节误差为17.1 mm、18.7 mm和10.2 mm,明显优于现有的多视图三维人体姿态估计算法。结论本文提出了一个能够构建多视图间人体关节一致性联系以及各自视图中人体骨架内在拓扑约束的多视图三维人体姿态估计算法,优化二维估计结果,修正错误姿态,有效地提高了三维人体姿态估计的精确度,取得了最佳的估计结果。Objective 3D human pose estimation is fundamental to understanding human behavior and aims to estimate 3D joint points from images or videos.It is widely used in downstream tasks such as human-computer interaction,virtual fit⁃ting,autonomous driving,and pose tracking.According to the number of cameras,3D human pose estimation can be divided into monocular 3D human pose estimation and multi-view 3D human pose estimation.The ill-posed problem caused by occlusion and depth ambiguity means that estimating the 3D human joint points by monocular 3D human pose estimation is difficult.However multi-view 3D human pose estimation can obtain the depth of each joint from multiple images,which can overcome this problem.In most recent methods,the triangulation module is used to estimate the 3D joint positions by leveraging their 2D counterparts measured in multiple images to 3D space.This module is usually used in a two-stage pro⁃cedure:First,the 2D joint coordinates of the human on each view are estimated separately by using a 2D pose detector,and then the 3D pose from multi-view 2D poses by applying triangulation.On this basis,some methods work with epipolar geometry to fuse the human joint features to establish the correlation among multiple views,which can improve the accuracy of 3D estimation.However,when the system performance is constrained by the effectiveness of the 2D estimation results,improving the final 3D estimation accuracy further is difficult.Therefore,to extract human contextual information for more effective 2D features,we construct a novel 3D pose estimation network to explore the correlation of the same joint among multiple views and the correlation between neighbor joints in the single view.Method In this paper,we propose a 3D human pose estimation method based on multi-view controllable fusion and joint correlation(CFJCNet),which includes three parts:a controllable multi-view fusion optimization module,a 2D pose refinement module,and a structural triangula⁃tion module.First,a set of RGB images capt
关 键 词:多视图 三维人体姿态估计 关节相关性 图卷积网络(GCN) 注意力机制 三角剖分
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...