基于视角统一的手姿态估计优化方法  

Optimization method of hand pose estimation based on unified view

作  者:曹忠锐 谢文军 王冬[1] 钮立超 王婷玉 刘晓平 Cao Zhongrui;Xie Wenjun;Wang Dong;Niu Lichao;Wang Tingyu;Liu Xiaoping(School of Computer Science&Information Engineering,Hefei University of Technology,Hefei 230601,China;School of Software,Hefei University of Technology,Hefei 230601,China;Anhui Province Key Laboratory of Industry Safety&Emergency Technology,Hefei University of Technology,Hefei 230601,China)

机构地区:[1]合肥工业大学计算机与信息学院,合肥230601 [2]合肥工业大学软件学院,合肥230601 [3]合肥工业大学工业安全与应急技术安徽省重点实验室,合肥230601

出  处:《计算机应用研究》2025年第1期293-299,共7页Application Research of Computers

基  金:国家自然科学基金面上项目(62277014);安徽省重点研究与开发计划资助项目(2022f04020006);中央高校基本科研业务费专项资金资助项目(PA2023GDSK0047)。

摘  要:从深度图像中准确估计手的三维姿态是计算机视觉领域的重要任务。然而,由于手的自遮挡和关节自相似性,使得手姿态估计任务极具挑战性。为了克服这些困难,考察了深度图像采样视角对于估计精度的影响,提出了一种基于视角统一(UVP)的网络。该网络旨在将输入的深度图像重采样为更易于估计的“正面”视角,而后通过原始视角下的特征提高关节估计精度。首先,提出了视角转换模块,实现对输入的单张深度图像的视角旋转,提供作为补充的第二视角;然后,提出了视角统一损失函数,确保转换后的第二视角为“正面”视角,最大程度规避自遮挡问题;最后,通过改变卷积组合结构、降低网络深度等网络轻量化手段,进一步优化方法的性能。通过在三个公开的手姿态数据集(包括ICVL、NYU和MSRA)上进行实验,所提方法分别取得了4.92 mm、7.43 mm和7.02 mm的平均关节位置误差,且在搭载RTX3070的计算机上能以159.39 frame/s的速度运行。可见,转换深度图的采样视角,并融合双视角下的特征有利于提高手部姿态估计的精度。同时,所提方法具备自适应性,并表现出优秀的泛化能力,可以推广到大多数基于单深度图像的手部姿态估计模型,为深度学习在三维手姿态估计中的应用提供了有力支持。Estimating the three-dimensional pose of hands accurately from depth images is an important task in the field of computer vision.However,due to self-occlusion of hands and joint self-similarity,hand pose estimation is extremely challen-ging.To overcome these difficulties,this paper investigated the impact of depth image sampling viewpoints on estimation accuracy and proposed a UVP network.This network aimed to resample input depth images to a more easily estimable“front-facing”viewpoint and then enhance joint estimation accuracy through features from the original viewpoint.Firstly,it proposed a viewpoint transformation module to perform viewpoint rotation on input single-depth images,providing a supplementary se-cond viewpoint.Then,it introduced a viewpoint unification loss function to ensure that the transformed second viewpoint aligned with the“front-facing”viewpoint,minimizing self-occlusion issues.Finally,by employing network lightweight techniques such as changing convolutional combinations and reducing network depth,the method’s performance was further optimized.Experimental results on three publicly available hand pose datasets(including ICVL,NYU,and MSRA)show average joint position errors of 4.92 mm,7.43 mm,and 7.02 mm,respectively.Moreover,the method achieves a processing speed of 159.39 frame/s on a computer equipped with an RTX3070 graphics card.Thus,it is evident that sampling depth images from different viewpoints and integrating features from dual viewpoints contribute to improved hand pose estimation accuracy.Additionally,the proposed method demonstrates adaptability and outstanding generalization capabilities,making it applicable to most single-depth image-based hand pose estimation models and providing robust support for the application of deep learning in three-dimensional hand pose estimation.

关 键 词:手部姿态估计 手关节自遮挡 视角统一 深度图像 点云变换 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象