DPT‐tracker:Dual pooling transformer for efficient visual tracking

作　　者：Yang Fang Bailian Xie Uswah Khairuddin Zijian Min Bingbing Jiang Weisheng Li

机构地区：[1]Key Laboratory of Data Engineering and Visual Computing,Chongqing University of Posts and Telecommunications,Chongqing,China [2]Department of Mechanical Precision Engineering,Malaysia‐Japan International Institute of Technology,University of Technology Malaysia,Kuala Lumpur,Malaysia [3]Department of Electrical and Computer Engineering,Inha University,Incheon,Republic of Korea [4]School of Information Science and Technology,Hangzhou Normal University,Hangzhou,China

出　　处：《CAAI Transactions on Intelligence Technology》2024年第4期948-959,共12页智能技术学报（英文）

基　　金：the National Natural Science Foundation of China,Grant/Award Number:62006065;the Science and Technology Research Program of Chongqing Municipal Education Commission,Grant/Award Number:KJQN202100634;the Natural Science Foundation of Chongqing,Grant/Award Number:CSTB2022NSCQ‐MSX1202;Chongqing Municipal Education Commission,Grant/Award Number:KJQN202100634。

摘　　要：Transformer tracking always takes paired template and search images as encoder input and conduct feature extraction and target‐search feature correlation by self and/or cross attention operations,thus the model complexity will grow quadratically with the number of input images.To alleviate the burden of this tracking paradigm and facilitate practical deployment of Transformer‐based trackers,we propose a dual pooling transformer tracking framework,dubbed as DPT,which consists of three components:a simple yet efficient spatiotemporal attention model(SAM),a mutual correlation pooling Trans-former(MCPT)and a multiscale aggregation pooling Transformer(MAPT).SAM is designed to gracefully aggregates temporal dynamics and spatial appearance information of multi‐frame templates along space‐time dimensions.MCPT aims to capture multi‐scale pooled and correlated contextual features,which is followed by MAPT that aggregates multi‐scale features into a unified feature representation for tracking prediction.DPT tracker achieves AUC score of 69.5 on LaSOT and precision score of 82.8 on Track-ingNet while maintaining a shorter sequence length of attention tokens,fewer parameters and FLOPs compared to existing state‐of‐the‐art(SOTA)Transformer tracking methods.Extensive experiments demonstrate that DPT tracker yields a strong real‐time tracking baseline with a good trade‐off between tracking performance and inference efficiency.

关键词：human‐computer interfacing image motion analysis pattern recognition signal processing TRACKING

分类号：TM41[电气工程—电器]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

DPT‐tracker:Dual pooling transformer for efficient visual tracking

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

DPT‐tracker:Dual pooling transformer for efficient visual tracking

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索