引入级联通道注意力的轻量化人体姿态估计  被引量:2

Lightweight Human Pose Estimation with Cascaded Channel Attention

在线阅读下载全文

作  者:林远强 郜辉[1] 王鹏[2] 吕志刚[1] 李晓艳[1] 王储 LIN Yuanqiang;GAO Hui;WANG Peng;LYU Zhigang;LI Xiaoyan;WANG Chu(School of Electronics and Information Engineering,Xi’an Technological University,Xi’an 710021,China;Development Planning Department,Xi’an Technological University,Xi’an 710021,China)

机构地区:[1]西安工业大学电子信息工程学院,西安710021 [2]西安工业大学发展规划处,西安710021

出  处:《计算机工程与应用》2024年第13期219-227,共9页Computer Engineering and Applications

基  金:国家自然科学基金(62171360);陕西省科技厅重点研发计划(2022GY-110);西安市智能兵器重点实验室(2019220514SYS020CG042);2022年度陕西高校青年创新团队项目。

摘  要:针对当前人体姿态估计模型在轻量化过程中精度损失严重的问题,以高分辨率网络(HRNet)为基线提出一种引入级联通道注意力的轻量化人体姿态估计模型。构建一种保持内部高分辨率特征的级联通道注意力,学习输入特征各通道的重要性来提高模型表征能力;通过设计一种基于MetaFormer结构的轻量级深度卷积变换模块来替换HRNet阶段2、3、4中运算复杂度较高的残差模块;设计一种多尺度特征融合方法减少HRNet原融合方法中的多维特征语义信息损失;采用无偏数据处理来消除关键点热力图编码过程中导致的偏移误差。COCO2017验证集的实验结果表明,所提出的模型同基准模型相比,在AP降低2个百分点的情况下,模型参数量和浮点运算量分别减少了90.2%和83.1%,并且以AP为71.4%的表现在轻量化模型中达到精度最优。Aiming at the problem of serious loss of accuracy in the lightweighting process of the current human pose estimation model,a lightweight human pose estimation model that introduces cascaded channel attention is proposed using the high resolution network(HRNet)as a baseline.Firstly,a cascading channel attention that maintains internal highresolution features is constructed so as to learn the importance of each channel of the input features to improve the model representation.Secondly,the residual module with high arithmetic complexity in HRNet stages 2,3,and 4 is replaced by designing a lightweight deepwise convolutional transform module based on the structure of the MetaFormer.Furthermore,a multi-scale feature fusion method is designed to reduce the loss of semantic information of multi-dimensional features in the original fusion method of HRNet.Finally,unbiased data processing is used to eliminate offset errors caused by the process of encoding the heat map at key points.Experimental results from the COCO2017 validation set show that the proposed model reduces the number of model parameters and floating-point operations by 90.2%and 83.1%,respectively,compared to the benchmark model with a 2 percentage points decrease in AP,and achieves the optimal accuracy among the lightweight models with an AP of 71.4%.

关 键 词:人体姿态估计 轻量化 通道注意力 MetaFormer结构 多尺度特征融合 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象