机构地区:[1]重庆国家应用数学中心,重庆401331 [2]西安电子科技大学广州研究院,广东广州710068 [3]中国人民解放军军事科学院,北京100191 [4]中国科学院沈阳自动化研究所,辽宁沈阳110169
出 处:《红外与激光工程》2024年第8期89-103,共15页Infrared and Laser Engineering
基 金:国家自然科学基金项目(62302073);重庆市自然科学基金项目(2024NSCQ-LZX0039);重庆市教委科学研究项目(KJZDK202200501)。
摘 要:由于缺乏大规模的红外跟踪训练数据集,现有的红外跟踪方法大都利用在大规模可见光数据上预训练的模型,然后在小规模的红外数据上进行完全微调。然而,当预训练模型的参数规模迅速增大时,完全微调需要的内存和时间成本也急剧增长,这限制了低资源用户在大型模型上进行研究和应用。为解决该问题,提出一种参数、内存和时间高效自适应的红外目标跟踪算法。首先,通过Transformer的自注意力机制对模板和搜索区域图像进行联合特征提取和关系建模,获取与目标关联度更强的特征表示;其次,利用低秩自适应矩阵的侧网络将可训练参数从主干网络中进行解耦,以减少需要训练更新的参数规模;最后,设计一种轻量级空间特征增强模块,增强特征对目标和背景的判别能力。提出方法的训练参数,内存和时间分别仅占完全微调方法的0.04%、39.6%和66.2%,性能却超越了完全微调。在3个标准红外跟踪数据集LSOTB-TIR120,LSOTB-TIR100和PTB-TIR上的实验对比结果和消融实验证明了提出的方法是有效的。提出的方法在LSOTB-TIR120数据集上成功率为73.7%,精度为86.0%,归一化精度为78.5%;LSOTB-TIR100数据集上成功率为71.6%,精度为83.9%,归一化精度为76.1%;在PTB-TIR数据集上成功率为69.0%,精度为84.9%,均取得了当前最先进的跟踪性能。Objective Since infrared images have limitations such as low resolution and limited target texture details,it is crucial to learn strong discriminative feature representation.In the current field of infrared target tracking,there is a shortage of large-scale infrared tracking training datasets.The largest infrared tracking training dataset in the tracking benchmark is currently LSOTB-TIR,which consists of 650000 trainable video frames.This dataset partially addresses the issue of insufficient labeled infrared data.However,its size is still significantly smaller compared to visible light mode tracking datasets such as LaSOT,GOT-10k,and TrackingNet,which contain 2.8 million,1.4 million,and 14 million trainable video frames,respectively.As a result,most existing deep learningbased infrared target tracking methods follow a common approach of pre-training on large-scale visible light data and fine-tuning on small-scale infrared data.However,this complete fine-tuning method becomes prohibitively expensive when training a Transformer tracker with a large number of parameters,which poses limitations for researchers and users with limited resources to explore and apply large-scale models.Methods To address this issue,this paper proposes an adaptive infrared target tracking algorithm that is efficient in terms of parameters,memory,and time.Firstly,it performs joint feature extraction and relationship modeling on the template and search area images using the self-attention mechanism of the Transformer.This process yields feature representations that are more closely associated with the target.Secondly,a low-rank adaptive matrix is employed in a side network to decouple trainable parameters from the backbone network.This reduces the parameter size that needs training and updating.Finally,a lightweight spatial feature enhancement module is designed to improve the feature's ability to discriminate between targets and backgrounds.Results and Discussions The proposed method achieves superior performance while requiring significa
关 键 词:红外目标跟踪 参数高效微调 低秩自适应矩阵 特征解耦 TRANSFORMER
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...