基于稀疏注意力的孪生网络目标跟踪算法

Siamese network object tracking algorithm based on sparse attention

作　　者：陈志旺[1,2] 杨天宇曹索航吕昌昊彭勇[4] CHEN Zhi-wang;YANG Tian-yu;CAO Suo-hang;LV Chang-hao;PENG Yong(Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment,Yanshan University,Qinhuangdao 066004,China;Key Laboratory of Industrial Computer Control Engineering of Hebei Province,Yanshan University,Qinhuangdao 066004,China;Key Lab of Power Electronics for Energy Conservation and Motor Drive of Hebei Province,Yanshan University,Qinhuangdao 066004,China;School of Electrical Engineering,Yanshan University,Qinhuangdao 066004,China)

机构地区：[1]燕山大学智能控制系统与智能装备教育部工程研究中心,河北秦皇岛066004 [2]燕山大学工业计算机控制工程河北省重点实验室,河北秦皇岛066004 [3]燕山大学电力电子节能与传动控制河北省重点实验室,河北秦皇岛066004 [4]燕山大学电气工程学院,河北秦皇岛066004

出　　处：《控制与决策》2024年第12期4017-4026,共10页Control and Decision

基　　金：河北省研究生专业学位精品教学案例(库)项目(KCJPZ2023012);国家自然科学基金项目(61573305);河北省自然科学基金项目(F2022203038,F2019203511)。

摘　　要：利用改进的Inception-Resnet-V2(IRV2)网络和局部-全局-局部(local-global-local,LGL)模块设计一种结合CNN和Transformer编码结构的孪生网络SiamLGL(siamese local-global-local network)用于目标跟踪.首先,算法特征提取部分采用改进后的IRV2网络,由于网络的层数更深,图片经过IRV2网络提取的特征较浅层网络提取的特征效果更优,特征融合部分采用深度互相关将特征图上的信息进行融合;其次,融合后的特征图利用LGL模块获取目标的全局和局部信息,模块内部采用两个编码器串联,第1个编码器利用深度可分离卷积获取目标的局部信息,第2个编码器利用自注意力获取图片的全局特征,为了降低自注意力结构的时间复杂度,采用稀疏注意力的方式进行计算,在降低时间复杂度的同时保证网络的精度;最后将特征图输入至分类回归网络中,生成对应的目标位置,其中分类网络采用二元交叉熵损失函数,回归网络采用Distance-IoU(DIoU)作为损失函数.算法在GOT-10k、LaSOT、TrackingNet、UAV123、OTB100和VOT2019等6个公开数据集上进行实验评估,结果验证了算法的有效性.An improved Inception-Resnet-V2(IRV2)network and local-global-local(LGL)module are used to design a siamese network structure based on CNN and Transformer coding structure for object tracking-SiamLGL(siamese local-global-local network).Firstly,due to the improved(IRV2)network with deep layers,the features extracted by the IRV2 network in the images are better than those extracted by the shallow network.Furthermore,the information on the feature map is fused through deep intercorrelation.Secondly,the fused feature map uses the LGL module to obtain the global and local information of the object,and two encoder layers are used in series inside the module,the first encoder layer with depth-separable convolution obtain the local information of the object,and the second encoder layer with self-attention obtain the global features of the picture.In order to reduce the time complexity of the self-attention structure,the sparse attention approach is used for the computation,which ensures the accuracy of the network while reducing the time complexity.Finally,the feature map is input to the classification and regression network to generate the corresponding object location.The classification network adopts the binary cross entropy loss function,and the regression network adopts Distance-IoU(DIoU)as the loss function.The algorithm is evaluated on six public datasets:GOT-10k,LaSOT,TrackingNet,UAV123,OTB100 and VOT2019.The experimental results verify the effectiveness of the proposed algorithm.

关键词：目标跟踪孪生网络 Inception-Resnet-V2网络稀疏注意力 Distance-IoU损失

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于稀疏注意力的孪生网络目标跟踪算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于稀疏注意力的孪生网络目标跟踪算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索