Transformer架构下跨多尺度信息融合的遥感影像建筑提取

Building extraction from remote sensing images via the multiscale information fusion method under the Transformer architecture

作　　者：刘异[1] 张寅捷敖洋江大龙张肇睿 LIU Yi;ZHANG Yinjie;AO Yang;JIANG Dalong;ZHANG Zhaorui(School of Geodesy and Geomatics,Wuhan University,Wuhan 430079,China)

机构地区：[1]武汉大学测绘学院,武汉430079

出　　处：《遥感学报》2024年第12期3173-3183,共11页NATIONAL REMOTE SENSING BULLETIN

基　　金：国家自然科学基金(编号:62071341)。

摘　　要：建筑物是城市中最为普遍的基础设施,获取遥感影像中的建筑区域对于城市规划、人口估计和灾情分析等具有重要的意义。本文基于Transformer结构,设计了一种端到端的遥感影像建筑区域提取方法。首先,针对多尺度影像特征存在的信息冗余和信息差异问题,本文提出了一种多次特征金字塔结构Tri-FPN(Triple-Feature Pyramid Network),实现跨越近邻尺度的全局多尺度信息融合,提高多尺度特征的类别表征一致性并减少信息的冗余。其次,针对多尺度提取结果融合时仅考虑尺度因素的问题,本文设计了一种顾及“尺度—类别—空间”的注意力模块CSA-Module(Class-Scale Attention Module),有效融合了不同尺度下的建筑提取结果。最后,在Transformer结构上加入Tri-FPN与CSA-Module进行模型训练,获得最佳的建筑提取效果。实验对比分析表明,本文的方法有效提高了建筑区域的检出率,并提供出更为准确的建筑轮廓,提升了遥感影像中建筑的提取精度,在WHU Building数据集和INRIA数据集上分别取得了91.53%和81.7%的IOU分数。As deep learning develops,researchers are paying increasing attention to its application in remote sensing building extraction.Many experiments on multiscale feature fusion,which boosts performance during the feature inference stage,and multiscale output fusion have been conducted to achieve a trade-off between accuracy and efficiency and obtain enhanced details and overall effects.However,current multiscale feature fusion methods consider only the nearest feature,which is insufficient for cross-scale feature fusion.The functions of multiscale output fusion are also limited in a unary correlation,which only considers the scale element.To address these problems,we propose a feature fusion method and a result fusion module to improve the accuracy of building extraction from remote sensing images.This study proposes the Triple-Feature Pyramid Network(Tri-FPN)and Class-Scale Attention Module(CSA-Module)based on Segformer to extract buildings in remote sensing images.The whole network structure is divided into three components:feature extraction,feature fusion,and classification head.In the feature extraction component,the Segformer structure is adopted to extract multiscale features.Segformer utilizes the self-attention function to extract feature maps of different scales.To adaptively enlarge the receptive fields,Segformer uses a strided convolution kernel to shrink the key and value vector in the self-attention computation process.The calculation cost decreases considerably.The goal of the feature fusion component is to fuse multiscale features from different parts of the feature extraction network.Tri-FPN consists of three feature pyramid networks.The fusion follows the sequence top-down,bottom-up,and top-down,thus enlarging the scale-receptive field.The basic fusion blocks are 3×3 convolution with feature element-wise addition and 1×1 convolution with channel concatenation.This design helps maintain the spatial diversity and inner-class feature consistency.In the classification head component,each pixel is assi

关键词：遥感影像建筑提取深度学习 TRANSFORMER 影像特征金字塔类别尺度注意力

分类号：P237[天文地球—摄影测量与遥感] P2[天文地球—测绘科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Transformer架构下跨多尺度信息融合的遥感影像建筑提取

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Transformer架构下跨多尺度信息融合的遥感影像建筑提取

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索