基于多尺度空洞可分离卷积的视觉Transformer的端到端可训练头部姿态估计  

End-to-End Trainable Head Pose Estimation with Vision Transformer Based on Multi-Scale Dilated Separable Convolution

在线阅读下载全文

作  者:尧京京 Jingjing Yao(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science&Technology,Shanghai)

机构地区:[1]上海理工大学光电信息与计算机工程学院,上海

出  处:《建模与仿真》2025年第3期426-434,共9页Modeling and Simulation

摘  要:在本文中,我们基于Hopenet网络和视觉Transformer提出了一种用于RGB图像头部姿势估计的新方法,并设计了一种新颖的架构,由以下三个关键组件组成:(1)骨干网络,(2)视觉Transformer,(3)预测头。我们还对骨干网络进行了改进,采用多尺度空洞可分离卷积以增强特征提取能力。相比于传统卷积神经网络和视觉Transformer提取特征的方式,我们的骨干网络在降低图像分辨率的同时,能够更有效地保留关键信息。通过消融实验,我们验证了基于多尺度空洞可分离卷积的骨干网络在特征保留能力上优于传统的深度卷积网络和视觉Transformer架构。我们在300W-LP和AFLW2000数据集上进行了全面的实验与消融研究。实验结果表明,所提出的方法在头部姿势估计任务上,相较于Hopenet及部分基于Transformer编码器的方法(如HeadPosr),在准确性和鲁棒性方面均实现了显著提升。In this paper,we propose a novel approach for head pose estimation from RGB images,leveraging the Hopenet network and Vision Transformer.Our method introduces an innovative architecture comprising three key components:(1)a backbone network,(2)a Vision Transformer,and(3)a pre-diction head.To enhance feature extraction capabilities,we further improve the backbone network by incorporatingmulti-scale dilated separable convolutions.Compared to traditional convolutional neural networks and Vision Transformers for feature extraction,our backbone network effectively preserves critical information while reducing image resolution.Through ablation studies,we vali-date that the proposed backbone network,equipped with multi-scale dilated separable convolu-tions,outperforms conventional deep convolutional networks and Vision Transformer-based ar-chitectures in terms of feature retention.We conduct extensive experiments and ablation studies on the 300W-LP and AFLW2000 datasets.Experimental results demonstrate that our approach sig-nificantly improves both accuracy and robustness in head pose estimation,outperforming Hopenet and certain Transformer-based encoder methods,such as HeadPose.

关 键 词:姿势估计 多尺度空洞可分离卷积 视觉Transformer Transformer编码器 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象