机构地区:[1]南京理工大学计算机科学与工程学院,南京210094 [2]南京理工大学设计艺术与传媒学院,南京210094
出 处:《中国图象图形学报》2024年第12期3529-3542,共14页Journal of Image and Graphics
基 金:国家自然科学基金项目(62172222);国家重点研发计划(国际合作专项)资助(SQ2023YFE0102775)。
摘 要:人体姿态估计(human pose estimation,HPE)是计算机视觉中的一项基本任务,旨在从给定的图像中获取人体关节的空间坐标,在动作识别、语义分割、人机交互和人员重新识别等方面得到了广泛应用。随着深度卷积神经网络(deep convolutional neural network,DCNN)的兴起,人体姿态估计取得了显著进展。然而,尽管取得了不错的成果,人体姿态估计仍然是一项具有挑战性的任务,特别是在面对复杂姿态、关键点尺度的变化和遮挡等因素时。为了总结关于遮挡的人体姿态估计技术的发展,本文系统地概述了自2018年以来的代表性方法,根据神经网络包含的训练数据、模型结构以及输出结果,将方法细分为基于数据增广(data augmentation)的预处理、基于特征区分的结构设计和基于人体先验的结果优化3类。基于数据增广方法通过生成遮挡的数据来增加训练样本;基于特征区分的方法通过利用注意力机制等方式来减少干扰特征;基于人体结构先验的方法通过利用人体结构先验来优化遮挡姿态。同时,为了更好地评测遮挡方法的性能,重新标注了MSCOCO (Microsoft common objects in context)val2017数据集。最后,对各种方法进行了对比和总结,阐明了它们在面对遮挡时性能的优劣。此外,在此基础上总结和讨论了遮挡情况下人体姿态估计困难的原因以及该领域未来的发展趋势。Human pose estimation(HPE) is a prominent area of research in computer vision whose primary goal is to accurately localize annotated keypoints of the human body,such as wrists and eyes.This fundamental task serves as the basis for numerous downstream applications,including human action recognition,human-computer interaction,pedestrian re-identification,video surveillance,and animation generation,among others.Thanks to the powerful nonlinear mapping capabilities offered by convolutional neural networks,HPE has experienced notable advancements in recent years.Despite this progress,HPE remains a challenging task,particularly when facing complex postures,variations in keypoint scales,occlusion,and other factors.Notably,the current heatmap-based methods suffer from severe performance degradation when encountering occlusion,which remains a critical challenge in HPE given that diverse human postures,complex backgrounds,and various occluding objects can all cause performance degradation.To comprehensively delve into the recent advancements in occlusion-aware HPE,this paper not only explores the intricacies of occlusion prediction difficulties but also delves into the reasons behind these challenges.The identified challenges encompass the absence of annotated occluded data.Annotating occluded data is inherently complex and demanding.Most of the prevalent datasets for HPE predominantly focus on visible keypoints,with only a few datasets addressing and annotating occlusion scenarios.This deficiency in annotated occluded data during model training significantly compromises the robustness of models in effectively handling situations that involve a partial or complete obstruction of body keypoints.Feature confusion presents a key challenge for top-down HPE methods,where the reliance on detected bounding boxes extracted from the image leads to the cropping of the target person's region for keypoint prediction.However,in the presence of occlusion,these detection boxes may include individuals other than the target person,thereby
关 键 词:人体姿态估计(HPE) 遮挡 数据增广 人体结构先验 遮挡标注数据不足
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...