CNN-Transformer结合对比学习的高光谱与LiDAR数据协同分类  

Collaborative classification of hyperspectral and LiDAR data based on CNN-transformer

在线阅读下载全文

作  者:吴海滨[1] 戴诗语 王爱丽[1] 岩堀祐之 于效宇 WU Haibin;DAI Shiyu;WANG Aili;YUJI Iwahori;YU Xiaoyu(Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application,College of Measurement and Control Technology and Communication Engineering,Harbin University of Science and Technology,Harbin 150080,China;Department of Computer Science,Chubu University,Aichi 487-8501,Japan;College of Electron and Information,University of Electronic Science and Technology of China,Zhongshan Institute,Zhongshan 528400,China)

机构地区:[1]哈尔滨理工大学测控技术与通信工程学院黑龙江省激光光谱技术及应用重点实验室,黑龙江哈尔滨150080 [2]中部大学计算机科学学院,日本爱知487-8501 [3]电子科技大学中山学院电子信息学院,广东中山528400

出  处:《光学精密工程》2024年第7期1087-1100,共14页Optics and Precision Engineering

基  金:黑龙江省自然科学基金资助项目(No.JJ2023LH1143);黑龙江省重点研发计划资助项目(No.JD2023SJ19);“一带一路”创新人才交流外国专家项目(No.G2022012010L);黑龙江省级领军人才梯队后备带头人资助项目。

摘  要:针对高光谱图像(hyperspectral images,HSI)与LiDAR数据多模态分类任务中的跨模态信息表达和特征对齐等问题,提出一种基于对比学习CNN-Transformer高光谱和LiDAR数据协同分类网络(Contrastive Learning based CNNTransformer Network,CLCT-Net)。CLCT-Net通过由ConvNeXt V2 Block构成的共有特征提取模块,获得不同模态间的共性特征,解决异构传感器数据之间语义对齐的问题。构建了包含空间-通道分支和光谱上下文分支的双分支HSI编码器,以及结合频域自注意力机制的LiDAR编码器,以获取更丰富的特征表示。利用集成对比学习进行分类,进一步提升多模态数据协同分类的精度。在Houston 2013和Trento数据集上的实验结果表明,相较于其他高光谱图像和Li‐DAR数据分类模型,本文所提模型获得了更高的地物分类精度,分别达到了92.01%和98.90%,实现了跨模态数据特征的深度挖掘和协同提取。To tackle the challenges in multimodal classification tasks involving hyperspectral images(HSI)and LiDAR data,such as cross-modal information expression and feature alignment,this paper introduces a contrastive learning-based multi-branch CNN-Transformer network(CLCT-Net)for the joint classification of hyperspectral and LiDAR data.Initially,CLCT-Net employs a feature extraction module with a ConvNeXt V2 Block to capture shared features across different modalities,addressing the semantic alignment issue between data from heterogeneous sensors.It then develops a dual-branch HSI encoder with spatial channel and spectral context branches,alongside a LiDAR encoder enhanced by a frequency domain self-attention mechanism,to secure more comprehensive feature representations.Lastly,it leverages ensemble contrastive learning for classification to further refine the accuracy of multimodal collaborative classification.Experimental evaluations on the Houston 2013 and Trento datasets demonstrate that the proposed model excels in extracting and integrating cross-modal data features,achieving superior ground object classification accuracies of 92.01% and 98.90%,respectively,when compared to existing models for classifying hyperspectral images and LiDAR data.

关 键 词:高光谱图像 激光雷达数据 TRANSFORMER 卷积神经网络 对比学习 

分 类 号:TP394.1[自动化与计算机技术—计算机应用技术] TH691.9[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象