Efficient Reconstruction of Spatial Features for Remote Sensing Image-Text Retrieval  

基于空间特征高效重构的遥感图文检索方法

在线阅读下载全文

作  者:ZHANG Weihang CHEN Jialiang ZHANG Wenkai LI Xinming GAO Xin SUN Xian 张伟航;陈佳良;张文凯;李新明;高鑫;孙显(中国科学院空天信息创新研究院,北京100190;中国科学院目标认知与应用技术重点实验室,北京100190;中国科学院大学电子电气与通信工程学院,北京100190;空天信息大学计算机与人工智能学院,济南250299)

机构地区:[1]Aerospace Information Research Institute,Chinese Academy of Sciences,Beijing 100190,P.R.China [2]Key Laboratory of Target Cognition and Application Technology(TCAT),Chinese Academy of Sciences,Beijing 100190,P.R.China [3]School of Electronic,Electrical and Communication Engineering,University of Chinese Academy of Sciences,Beijing 100190,P.R.China [4]School of Computer Science and Artificial Intelligence,Aerospace Information Technology University,Jinan 250299,P.R.China

出  处:《Transactions of Nanjing University of Aeronautics and Astronautics》2025年第1期101-111,共11页南京航空航天大学学报(英文版)

基  金:supported by the National Key R&D Program of China(No.2022ZD0118402)。

摘  要:Remote sensing cross-modal image-text retrieval(RSCIR)can flexibly and subjectively retrieve remote sensing images utilizing query text,which has received more researchers’attention recently.However,with the increasing volume of visual-language pre-training model parameters,direct transfer learning consumes a substantial amount of computational and storage resources.Moreover,recently proposed parameter-efficient transfer learning methods mainly focus on the reconstruction of channel features,ignoring the spatial features which are vital for modeling key entity relationships.To address these issues,we design an efficient transfer learning framework for RSCIR,which is based on spatial feature efficient reconstruction(SPER).A concise and efficient spatial adapter is introduced to enhance the extraction of spatial relationships.The spatial adapter is able to spatially reconstruct the features in the backbone with few parameters while incorporating the prior information from the channel dimension.We conduct quantitative and qualitative experiments on two different commonly used RSCIR datasets.Compared with traditional methods,our approach achieves an improvement of 3%-11% in sumR metric.Compared with methods finetuning all parameters,our proposed method only trains less than 1% of the parameters,while maintaining an overall performance of about 96%.遥感跨模态图文检索(Remote sensing cross⁃modal image⁃text retrieval,RSCIR)旨在利用查询文本灵活、主观地检索遥感图像,近年来受到了越来越多研究者的关注。然而,随着预训练模型参数的不断增加,直接迁移学习的方法需要消耗大量的计算和存储资源。此外,最近提出的参数高效迁移学习方法主要聚焦于通道特征的重建,忽略了对关键实体关系建模至关重要的空间特征。为了解决这些问题,本文提出了一种基于空间特征高效重构(Spatial feature efficient reconstruction,SPER)的遥感跨模态图文检索方法,设计了一个简洁高效的空间适配器,以增强空间关系的提取能力。空间适配器只需通过少量参数即可对骨干网络中的特征进行空间重构,同时结合通道维度的先验信息。在两个常用的遥感图文检索数据集上进行的定量和定性实验表明,本文方法在sumR指标上相比传统方法提升了3%~11%。此外,与全参数训练方法相比,本文方法仅训练不到1%的参数,但整体性能仍保持在96%左右。

关 键 词:remote sensing cross-modal image-text retrieval(RSCIR) spatial features channel features contrastive learning parameter effective transfer learning 

分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象