3D-CNN与Transformer混合结构的高光谱图像空谱联合分类  

Specral-spatial classification of hyperspectral imagery with hybrid architecture of 3D-CNN and Transformer

在线阅读下载全文

作  者:景海钊 陶丽杰 张号逵 JING Haizhao;TAO Lijie;ZHANG Haokui(Northwestern Polytechnical University,Xi’an 710129,China)

机构地区:[1]西北工业大学,陕西西安710129

出  处:《光学精密工程》2024年第23期3504-3512,共9页Optics and Precision Engineering

基  金:国家自然科学基金资助项目(No.62401471);2024年姑苏创新创业领军人才计划(青年创新领军人才)资助项目(No.ZXL2024333)。

摘  要:针对高光谱图像(Hyperspectral Images,HSI)地面覆盖类的像素级分类问题,提出一种3D-ConvFormer的混合结构模型,该模型通过在浅层使用三维卷积(3D-CNN)操作提取高光谱图像的局部空间光谱特征,在深层利用自注意力(Slef-attention)机制在卷积窗口内提取空间光谱特征,实现了卷积网络的平移不变性与self-attention对特征的灵活提取能力的有效融合。在Indian Pines,PaviaU和WHU Hi Longkou 3组公开的高光谱图像数据集上进行实验,采用总体分类精度(OA)、平均分类精度(AA)和Kappa系数3个指标,对地物类别的像素级分类结果进行量化评估。实验结果表明,模型在Indian Pines数据集上OA为98.41%,AA为97.56%,Kappa为98.16%;在PaviaU数据集上OA为99.39%,AA为99.30%,Kappa为99.18%;在WHU-Hi-Longkou数据集上OA为98.53%,AA为98.97%,Kappa为98.06%。模型在3组高光谱图像分类任务中展示出的性能均优于对比的模型方法,取得了良好的分类性能,有效提升高光谱图像的分类精度。To address pixel-level land cover classification in hyperspectral images(HSI),a hybrid model 3D-ConvFormer is proposed.The model integrates 3D convolutional neural networks(3D-CNN)and self-attention mechanisms to effectively extract spatial-spectral features.In the shallow layers,3D-CNN operations capture local spatial-spectral features,while in the deeper layers,the self-attention mechanism operates within convolutional windows to enhance feature extraction flexibility.This design achieves a synergistic fusion of the translation invariance of convolutional networks and the adaptive feature extraction capabilities of self-attention.The model's performance was evaluated on three publicly available hyperspectral image datasets—Indian Pines,PaviaU,and WHU-Hi-Longkou—using three metrics:Overall Accuracy(OA),Average Accuracy(AA),and the Kappa coefficient.Experimental results demonstrate that the proposed model achieved an OA of 98.41%,AA of 97.56%,and Kappa of 98.16%on the Indian Pines dataset;an OA of 99.39%,AA of 99.30%,and Kappa of 99.18%on the PaviaU dataset;and an OA of 98.53%,AA of 98.97%,and Kappa of 98.06%on the WHU-Hi-Longkou dataset.Compared to baseline models,3D-ConvFormer consistently outperformed in classification tasks across all three datasets,significantly improving the accuracy of hyperspectral image classification.

关 键 词:计算机视觉 高光谱图像 卷积神经网络 视频转换 自注意力机制 

分 类 号:TP394.1[自动化与计算机技术—计算机应用技术] TH691.9[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象