基于改进Transformer的短时动作识别  

Short-term Action Recognition Based on Improved Transformer

作  者:王贺 王馨叶 WANG He;WANG Xin Ye(College of Physics and Electronic Engineering,Shanxi University,Taiyuan,030006,China)

机构地区:[1]山西大学物理电子工程学院,太原030006

出  处:《网络新媒体技术》2025年第1期33-40,共8页Network New Media Technology

摘  要:人类动作识别是计算机视觉领域重要的研究课题之一。如何提高识别的准确性一直是该方向的研究重点。除了传统的卷积层和递归层之外,人类动作识别还会使用注意力机制来提高泛化效率。为此,本文提出一种基于图像相对位置编码(IRPE)和Transformer架构的动作识别算法。模型中增加位置编码层以提高模型理解序列的能力;然后,在Transformer编码器前放置一个Twin Transformer层,以高效提取底层特征表示;框架的最后,通过多层感知器来获得最终的类预测。实验结果表明,该模型在MPOSE2021数据集OpenPose版本上准确率为95.87%(Split1),94.50%(Split2)和95.94%(Split3);在PoseNet版本上的准确率为91.03%(Split1),90.40%(Split2)和89.94%(Split3)。Human action recognition is one of the most important research topics in computer vision.How to improve the accuracy of recognition has always been the focus of research in this direction.In addition to the traditional convolutional and recursive layers,human action recognition also uses the attention mechanism to improve the generalization efficiency.For this reason,this paper proposes an action recognition algorithm based on IRPE and Transformer architecture.A position encoding layer is added to the model to improve the model’s ability to understand sequences;in addition,a Twin Transformer layer is placed in front of the Transformer encoder to efficiently extract the underlying feature representations;and the framework concludes with a multilayer perceptron to obtain the final class prediction.The experimental results show that the accuracy of the model is 95.87%(Split1),94.50%(Split2)and 95.94%(Split3)on the OpenPose version of the MPOSE2021 dataset;and 91.03%(Split1),90.40%(Split2)and 89.94%(Split3).

关 键 词:深度学习 动作识别 位置编码 TRANSFORMER 多层感知器 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象