检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:江以恒 李洋 刘春颜 赵蕴龙 JIANG Yiheng;LI Yang;LIU Chunyan;ZHAO Yunlong(College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;Unmanned Aerial Vehicles Research Institute,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106 [2]南京航空航天大学无人机研究院,南京211106
出 处:《计算机科学》2025年第3期68-76,共9页Computer Science
基 金:新一代人工智能国家科技重大专项(2022ZD0115403)。
摘 要:多视角多人三维人体姿态估计被广泛应用于各类计算机视觉任务中。当前基于空间体素的方法由于需要消耗巨大的资源难以实现在边缘计算设备上的实时性运算;而回归方法因缺乏几何约束导致泛化能力有限,在新的环境中无法直接应用而需要采集数据进行微调。通过结合空间体素方法与基于回归的姿态估计方法并融合二者的特点,提出了基于中心点注意力回归的多视角多人三维人体姿态估计模型。该模型通过一个小规模的体素网络粗略估计人体中心点位置,并以此构建初始姿态,随后在人体中心点的范围内进行回归预测得到更精确的人体姿态。本研究通过结合空间关键点位置,使得模型的回归预测更加准确,在大尺度上平均准确率提升1.16%,同时使得模型非常容易训练,在小样本微调中准确率最多提升了12%。这使得基于回归的模型可以在新的场景下通过小数据量的训练快速部署而实现泛化性能和通用性的大幅提升。Multi-view multi-person 3D human pose estimation is widely used in various computer vision tasks.Current spatial voxel-based methods are difficult to achieve real-time computing on edge computing devices due to huge resource consumption.However,the regression method has limited generalization ability due to the lack of geometric constraints.In a new environment,it cannot be directly applied and needs to collect data for fine-tuning.By combining the spatial voxel method and the regression-based pose estimation method,we propose a multi-view multi-person 3D human pose estimation model based on center point attention regression.The model roughly estimates the position of the human body center through a small-scale voxel network,and constructs the initial pose based on it.Then the regression prediction is carried out within the range of the human body center point to obtain more accurate human pose.In this study,by combining the spatial key point positions,the regression prediction of the model is more accurate,and the average accuracy is improved by 1.16% on large scales.At the same time,the model is very easy to train,and the accuracy is improved by up to 12% in small sample fine-tuning.This allows regression-based models to greatly expand the generalization performance and versatility of such models in new scenarios by rapidly deploying them with small amounts of training data.
关 键 词:三维人体姿态估计 多视角 中心点预测网络 中心点注意力 TRANSFORMER 体素网络
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.16.42.17