检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王燕妮[1] 胡敏 韩世鹏 陈艺瑄 吕昊 WANG Yanni;HU Min;HAN Shipeng;CHEN Yixuan;LYU Hao(School of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710055,China;Department of Military Biomedical Engineering,Air Force Medical University of PLA,Xi’an 710032,China)
机构地区:[1]西安建筑科技大学信息与控制工程学院,西安710055 [2]空军军医大学军事生物医学工程学系,西安710032
出 处:《计算机工程与应用》2025年第6期199-209,共11页Computer Engineering and Applications
基 金:国家自然科学基金(61803294);陕西省自然科学基础研究项目(2020JM499,2020JQ684)。
摘 要:人体姿态估计的精度提升通常依赖于特征融合,但是现有特征融合策略往往忽略了尺度特征和层级特征之间的交互作用。为了充分利用不同特征之间的互补性,提出了一种新特征融合策略用以提升人体姿态估计精度,即多尺度和多层级特征融合网络(multi-scale and multi-level network,MSLNet)。采用高分辨率网络(high-resolution network,HRNet)作为主干,通过跨尺度信息交互,实现不同分辨率特征图之间的信息交换,获取同时包含细粒度和粗粒度的姿态特征;引入期望最大化注意力-加权双向特征金字塔网络(expectation maximization attention-bidirectional feature pyramid network,EMA-BiFPN),实现多尺度特征融合后的多层级特征聚合,从局部到全局捕捉人体姿态的细节和关联信息;设计由残差结构组成的关键点检测头,完成输出特征的最终融合并提升人体关键点检测准确率。实验结果表明,MSLNet在COCO和MPII数据集上分别取得了75.8%和91.1%的准确率,实现了最优精度,充分验证了MSLNet能够融合尺度和层级之间的互补特征,进而提升人体姿态估计精度。The accuracy improvement of human pose estimation usually depends on feature fusion.However,the existing feature fusion strategies often ignore the interaction between scale features and level features.The fusion of single mode may result in less significant feature expression.To make full use of the complementarity between different features,a new multi-scale and multi-level feature fusion network(MSLNet)is proposed.The high-resolution network(HRNet)is used as the backbone to exchange information between feature maps of different resolutions through cross-scale information exchange,and to obtain both fine-grained and coarse-grained pose features.The expectation maximization attention bidirectional feature pyramid network(EMA-BiFPN)is introduced to achieve multi-level feature aggregation after multiscale feature fusion.The details and correlation information of human pose are captured from local to global.A keypoint detection head composed of residual structure is designed to complete the final fusion of output features and improve the accuracy of human keypoint detection.The experimental results show that MSLNet achieves the best accuracy of 75.8%and 91.1%on COCO and MPII datasets,respectively.It is fully verified that MSLNet can make use of the complementarity between scale features and level features to improve the accuracy of human pose estimation.
关 键 词:高分辨率网络(HRNet) 人体姿态估计 期望最大化注意力 双向特征金字塔网络 特征融合
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7