多尺度和多层级特征融合的人体姿态估计

Human Pose Estimation with Multi-Scale and Multi-Level Feature Fusion

作　　者：王燕妮[1] 胡敏韩世鹏陈艺瑄吕昊 WANG Yanni;HU Min;HAN Shipeng;CHEN Yixuan;LYU Hao(School of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710055,China;Department of Military Biomedical Engineering,Air Force Medical University of PLA,Xi’an 710032,China)

机构地区：[1]西安建筑科技大学信息与控制工程学院,西安710055 [2]空军军医大学军事生物医学工程学系,西安710032

出　　处：《计算机工程与应用》2025年第6期199-209,共11页Computer Engineering and Applications

基　　金：国家自然科学基金(61803294);陕西省自然科学基础研究项目(2020JM499,2020JQ684)。

摘　　要：人体姿态估计的精度提升通常依赖于特征融合,但是现有特征融合策略往往忽略了尺度特征和层级特征之间的交互作用。为了充分利用不同特征之间的互补性,提出了一种新特征融合策略用以提升人体姿态估计精度,即多尺度和多层级特征融合网络(multi-scale and multi-level network,MSLNet)。采用高分辨率网络(high-resolution network,HRNet)作为主干,通过跨尺度信息交互,实现不同分辨率特征图之间的信息交换,获取同时包含细粒度和粗粒度的姿态特征;引入期望最大化注意力-加权双向特征金字塔网络(expectation maximization attention-bidirectional feature pyramid network,EMA-BiFPN),实现多尺度特征融合后的多层级特征聚合,从局部到全局捕捉人体姿态的细节和关联信息;设计由残差结构组成的关键点检测头,完成输出特征的最终融合并提升人体关键点检测准确率。实验结果表明,MSLNet在COCO和MPII数据集上分别取得了75.8%和91.1%的准确率,实现了最优精度,充分验证了MSLNet能够融合尺度和层级之间的互补特征,进而提升人体姿态估计精度。The accuracy improvement of human pose estimation usually depends on feature fusion.However,the existing feature fusion strategies often ignore the interaction between scale features and level features.The fusion of single mode may result in less significant feature expression.To make full use of the complementarity between different features,a new multi-scale and multi-level feature fusion network(MSLNet)is proposed.The high-resolution network(HRNet)is used as the backbone to exchange information between feature maps of different resolutions through cross-scale information exchange,and to obtain both fine-grained and coarse-grained pose features.The expectation maximization attention bidirectional feature pyramid network(EMA-BiFPN)is introduced to achieve multi-level feature aggregation after multiscale feature fusion.The details and correlation information of human pose are captured from local to global.A keypoint detection head composed of residual structure is designed to complete the final fusion of output features and improve the accuracy of human keypoint detection.The experimental results show that MSLNet achieves the best accuracy of 75.8%and 91.1%on COCO and MPII datasets,respectively.It is fully verified that MSLNet can make use of the complementarity between scale features and level features to improve the accuracy of human pose estimation.

关键词：高分辨率网络(HRNet) 人体姿态估计期望最大化注意力双向特征金字塔网络特征融合

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多尺度和多层级特征融合的人体姿态估计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多尺度和多层级特征融合的人体姿态估计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索