机构地区:[1]无锡学院江苏省集成电路可靠性技术及检测系统工程研究中心,无锡214105 [2]南京信息工程大学电子信息与工程学院,南京210044 [3]南京信息工程大学复杂环境智能保障技术教育部重点实验室,南京210044
出 处:《地球信息科学学报》2024年第12期2741-2758,共18页Journal of Geo-information Science
基 金:国家自然科学基金项目(62071240、62106111);江苏省产教融合型一流课程(2022-133);无锡学院教改研究课题(XYJG2023010、XYJG2023011)。
摘 要:高分辨率遥感图像中存在物体视觉特征模糊和同物异谱的问题,在单一模态下对相似地物和阴影遮挡的地物分割较为困难,因此本文提出了一种基于多模态特征提取与层级感知的遥感图像分割模型。本文引入了多模态特征提取模块来提取不同模态的特征信息,并通过坐标注意力机制充分融合不同模态的特征。抽象特征提取模块采用具有双路径瓶颈块的MobileNetV3作为主干网络,并引入了层级感知网络来提取深层次的抽象特征,通过嵌入像素的场景感知来改进注意力机制,实现高效且准确的类级上下文建模。解码部分设计了多尺度聚合双重融合,将低级特征与高级抽象语义特征相结合,利用逐步上采样实现特征恢复。本文基于ISPRS Vaihingen和Potsdam数据集上的高分辨率遥感图像,实验结果表明:(1)在包括C3Net、AMM-FuseNet、MMFNet、CMFet、CIMFNet和EDGNet在内的一系列对比模型中,MFEHPNet在各项性能指标上得到了显著提高,验证了遥感图像的语义分割性能;(2)MFEHPNet在ISPRS Vaihingen和Potsdam的总体精度为92.21%和93.45%、平均交并比为83.24%和83.94%、Kappa为0.85、频率加权交并比为89.24%和90.12%,显著提高了遥感图像的语义分割性能,能有效解决分割中的特征边界模糊和同物异谱等问题。In high-resolution remote sensing images,challenges such as blurred visual features of objects and different spectra for the same object arise.Segmenting similar ground objects and shaded ground objects in a single mode is difficult.Therefore,this paper proposes a remote sensing image segmentation model based on multi-modal feature extraction and hierarchical perception.The proposed model introduces a multi-modal feature extraction module to capture feature information from different modalities.Using the complementary information of IRRG and DSM,accurate pixel positions in the feature map are obtained,improving semantic segmentation of high-resolution remote sensing images.The coordinate attention mechanism fully fuses the features from different modalities to address issues of blurred visual features and different object spectra during image segmentation.The abstract feature extraction module uses MobileNetV3 with dual-path bottleneck blocks as the backbone network,reducing the number of parameters while maintaining model accuracy.The hierarchical perception network is introduced to extract deep abstract features,and the attention mechanism is improved by embedding scene perception of pixels.Leveraging the inherent spatial correlation of ground objects in remote sensing images,efficient and accurate class-level context modeling is achieved,minimizing excessive background noise interference and significantly improving the semantic segmentation performance.In the decoding module,the model uses multi-scale aggregation dual fusion for feature recovery,strengthening the connection between the encoder and the decoder.This combines low-level features with high-level abstract semantic features,enabling effective spatial and detailed feature fusion.Progressive upsampling is used for feature recovery,resolving the issue of blurred visual features and improving segmentation accuracy.Based on high-resolution remote sensing images from the ISPRS Vaihingen and Potsdam datasets,the experimental results demonstrate that MFEHPNe
关 键 词:遥感图像分割 多模态特征提取 双路径瓶颈块 层级感知 多尺度聚合 双重融合
分 类 号:TP751[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...