机构地区:[1]信息工程大学,郑州450001 [2]时空感知与智能处理自然资源部重点实验室,郑州450001 [3]智慧地球重点实验室,北京100020 [4]32158部队,喀什844000
出 处:《地球信息科学学报》2025年第1期193-206,共14页Journal of Geo-information Science
基 金:国家自然科学基金项目(42301464、42201443)。
摘 要:【目的】跨视角图像匹配与定位是指通过将地视查询影像与带有地理标记的空视参考影像进行匹配,从而确定地视查询影像地理位置的技术。目前的跨视角图像匹配与定位技术主要使用固定感受野的CNN或者具有全局建模能力的Transformer作为特征提取主干网络,不能充分考虑影像中不同特征之间的尺度差异,且由于网络参数量和计算复杂度较高,轻量化部署面临显著挑战。【方法】为了解决这些问题,本文提出了一种面向地面全景影像和卫星影像的多尺度特征聚合轻量化跨视角图像匹配与定位方法,首先使用LskNet提取影像特征,然后设计一个多尺度特征聚合模块,将影像特征聚合为全局描述符。在该模块中,本文将单个大卷积核分解为两个连续的相对较小的逐层卷积,从多个尺度聚合影像特征,显著减少了网络的参数量与计算量。【结果】本文在CVUSA、CVACT、VIGOR 3个公开数据集上进行了对比实验和消融实验,实验结果表明,本文方法在VIGOR数据集和CVACT数据集上的Top1召回率分别达到79.00%和91.43%,相比于目前精度最高的Sample4Geo分别提升了1.14%、0.62%,在CVUSA数据集上的Top1召回率达到98.64%,与Sample4Geo几乎相同,但参数量与计算量降至30.09 M和16.05 GFLOPs,仅为Sample4Geo的34.36%、23.70%。【结论】与现有方法相比,本文方法在保持高精度的同时,显著减少了参数量和计算量,降低了模型部署的硬件要求。[Objectives]Cross-view image matching and localization refers to the technique of determining the geographic location of a ground-view query image by matching it with a geotagged aerial reference image.However,significant differences in geometric appearance and spatial layout between different viewpoints often hinder traditional image matching algorithms.Existing methods for cross-view image matching and localization typically rely on Convolutional Neural Networks(CNNs)with fixed receptive fields or Transformers with global modeling capabilities for feature extraction.However,these approaches fail to fully address the scale differences among various features in the image.Additionally,due to their large number of network parameters and high computational complexity,these methods face significant challenges in lightweight deployment.[Methods]To address these issues,this paper proposes a lightweight cross-view image matching and localization method that employs multi-scale feature aggregation for ground panoramic and satellite images.The method first extracts image features using LskNet,then designs and introduces a multi-scale feature aggregation module to combine image features into a global descriptor.The module decomposes a single large convolution kernel into two sequential smaller depth-wise convolutions,enabling multiple scale feature aggregation.Meanwhile,spatial layout information is encoded into the global feature,producing a more discriminative global descriptor.By integrating LskNet and the multi-scale feature aggregation module,the proposed method significantly reduces parameters and computational cost while achieving superior accuracy on publicly available datasets.[Results]Experimental results on the CVUSA,CVACT,and VIGOR datasets demonstrate that the proposed method achieves Top-1 recall rates of 79.00%and 91.43%on the VIGOR and CVACT datasets,respectively,surpassing the current highest-accuracy method,Sample4Geo,by 1.14%and 0.62%.On the CVUSA dataset,the Top-1 recall rate reaches 98.64%,comparable t
关 键 词:跨视角图像匹配 多尺度特征 特征聚合 大卷积核分解 轻量化 地理定位
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...