基于Transformer的视觉目标跟踪方法综述  被引量:3

Survey of visual object tracking methods based on Transformer

在线阅读下载全文

作  者:孙子文 钱立志 杨传栋 高一博 陆庆阳 袁广林 SUN Ziwen;QIAN Lizhi;YANG Chuandong;GAO Yibo;LU Qingyang;YUAN Guanglin(Laboratory of Guidance Control and Information Perception Technology of High Overload Projectiles,Army Academy of Artillery and Air Defense,Hefei Anhui 230031,China;Department of Information Engineering,Army Academy of Artillery and Air Defense,Hefei Anhui 230031,China)

机构地区:[1]陆军炮兵防空兵学院高过载弹药制导控制与信息感知实验室,合肥230031 [2]陆军炮兵防空兵学院信息工程系,合肥230031

出  处:《计算机应用》2024年第5期1644-1654,共11页journal of Computer Applications

基  金:军队型号项目(LZX20190112)。

摘  要:视觉目标跟踪是计算机视觉中的重要任务之一,为实现高性能的目标跟踪,近年来提出了大量的目标跟踪方法,其中基于Transformer的目标跟踪方法由于具有全局建模和联系上下文的能力,是目前视觉目标跟踪领域研究的热点。首先,根据网络结构的不同对基于Transformer的视觉目标跟踪方法进行分类,概述相关原理和模型改进的关键技术,总结不同网络结构的优缺点;其次,对这类方法在公开数据集上的实验结果进行对比,分析网络结构对性能的影响,其中MixViT-L(ConvMAE)在LaSOT和TrackingNet上跟踪成功率分别达到了73.3%和86.1%,说明基于纯Transformer两段式架构的目标跟踪方法具有更优的性能和更广的发展前景;最后,对方法当前存在的网络结构复杂、参数量大、训练要求高和边缘设备使用难度大等不足进行总结,并对今后的研究重点进行展望,通过与模型压缩、自监督学习以及Transformer可解释性分析相结合,可为基于Transformer的视觉目标跟踪提出更多可行的解决方案。Visual object tracking is one of the important tasks in computer vision,in order to achieve high-performance object tracking,a large number of object tracking methods have been proposed in recent years.Among them,Transformerbased object tracking methods become a hot topic in the field of visual object tracking due to their ability to perform global modeling and capture contextual information.Firstly,existing Transformer-based visual object tracking methods were classified based on their network structures,an overview of the underlying principles and key techniques for model improvement were expounded,and the advantages and disadvantages of different network structures were also summarized.Then,the experimental results of the Transformer-based visual object tracking methods on public datasets were compared to analyze the impact of network structure on performance.in which MixViT-L(ConvMAE)achieved tracking success rates of 73.3%and 86.1%on LaSOT and TrackingNet,respectively,proving that the object tracking methods based on pure Transformer two-stage architecture have better performance and broader development prospects.Finally,the limitations of these methods,such as complex network structure,large number of parameters,high training requirements,and difficulty in deploying on edge devices,were summarized,and the future research focus was outlooked,by combining model compression,self-supervised learning,and Transformer interpretability analysis,more kinds of feasible solutions for Transformer-based visual target tracking could be presented.

关 键 词:计算机视觉 目标跟踪 混合网络结构 深度学习 孪生网络 TRANSFORMER 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象