检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王贺 王馨叶 WANG He;WANG Xin Ye(College of Physics and Electronic Engineering,Shanxi University,Taiyuan,030006,China)
出 处:《网络新媒体技术》2025年第1期33-40,共8页Network New Media Technology
摘 要:人类动作识别是计算机视觉领域重要的研究课题之一。如何提高识别的准确性一直是该方向的研究重点。除了传统的卷积层和递归层之外,人类动作识别还会使用注意力机制来提高泛化效率。为此,本文提出一种基于图像相对位置编码(IRPE)和Transformer架构的动作识别算法。模型中增加位置编码层以提高模型理解序列的能力;然后,在Transformer编码器前放置一个Twin Transformer层,以高效提取底层特征表示;框架的最后,通过多层感知器来获得最终的类预测。实验结果表明,该模型在MPOSE2021数据集OpenPose版本上准确率为95.87%(Split1),94.50%(Split2)和95.94%(Split3);在PoseNet版本上的准确率为91.03%(Split1),90.40%(Split2)和89.94%(Split3)。Human action recognition is one of the most important research topics in computer vision.How to improve the accuracy of recognition has always been the focus of research in this direction.In addition to the traditional convolutional and recursive layers,human action recognition also uses the attention mechanism to improve the generalization efficiency.For this reason,this paper proposes an action recognition algorithm based on IRPE and Transformer architecture.A position encoding layer is added to the model to improve the model’s ability to understand sequences;in addition,a Twin Transformer layer is placed in front of the Transformer encoder to efficiently extract the underlying feature representations;and the framework concludes with a multilayer perceptron to obtain the final class prediction.The experimental results show that the accuracy of the model is 95.87%(Split1),94.50%(Split2)and 95.94%(Split3)on the OpenPose version of the MPOSE2021 dataset;and 91.03%(Split1),90.40%(Split2)and 89.94%(Split3).
关 键 词:深度学习 动作识别 位置编码 TRANSFORMER 多层感知器
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.217.230.80