用于红外-可见光图像分类的跨模态双流交替交互网络  

Cross-modal dual-stream alternating interactive network for infrared-visible image classification

在线阅读下载全文

作  者:郑宗生[1] 杜嘉 成雨荷 赵泽骋 张月维 王绪龙 ZHENG Zongsheng;DU Jia;CHENG Yuhe;ZHAO Zecheng;ZHANG Yuewei;WANG Xulong(College of Information Technology,Shanghai Ocean University,Shanghai 201306,China;Guangzhou Meteorological Satellite Ground Station,Guangzhou Guangdong 510650,China;Shandong Provincial Institute of Land Space Data and Remote Sensing Technology(Shandong Provincial Marine Dynamic Monitoring Center),Jinan Shandong 250014,China)

机构地区:[1]上海海洋大学信息学院,上海201306 [2]广州气象卫星地面站,广州510650 [3]山东省国土空间数据和遥感技术研究院(山东省海域动态监视监测中心),济南250014

出  处:《计算机应用》2025年第1期275-283,共9页journal of Computer Applications

基  金:国家自然科学基金资助项目(41671431);上海市科委地方院校能力建设项目(19050502100);广州气象卫星地面站项目(D-8006-23-0157)。

摘  要:多特征模态融合时存在噪声的叠加,而为减小模态间的差异采用的级联方式的结构也未充分利用模态间的特征信息,因此设计一种跨模态双流交替交互网络(DAINet)方法。首先,构建双流交替增强(DAE)模块,以交互双分支形式融合模态特征,并通过学习模态数据的映射关系,以红外-可见光-红外(IR-VIS-IR)和可见光-红外-可见光(VIS-IR-VIS)的双向反馈调节实现模态间噪声的交叉抑制;然后,构建跨模态特征交互(CMFI)模块,并引入残差结构将红外-可见光模态内以及模态间的低层特征和高层特征进行有效融合,从而减小模态间的差异并充分利用模态间的特征信息;最后,在自建红外-可见光多模态台风数据集及RGB-NIR多模态公开场景数据集上进行实验,以验证DAE模块和CMFI模块的有效性。实验结果表明,与简单级联融合方法相比,所提的基于DAINet的特征融合方法在自建台风数据集上的红外模态和可见光模态上的总体分类精度分别提高了6.61和3.93个百分点,G-mean值分别提高了6.24和2.48个百分点,表明所提方法在类别不均衡分类任务上的通用性;所提方法在RGB-NIR数据集上的2种测试模态下的总体分类精度分别提高了13.47和13.90个百分点。同时,所提方法在2个数据集上分别与IFCNN(general Image Fusion framework based on Convolutional Neural Network)和DenseFuse方法进行对比的实验结果表明,所提方法在自建台风数据集上的2种测试模态下的总体分类精度分别提高了9.82、6.02和17.38、1.68个百分点。When multiple feature modalities are fused,there is a superposition of noise,and the cascaded structure used to reduce the differences between modalities does not fully utilize the feature information between modalities.To address these issues,a cross-modal Dual-stream Alternating Interactive Network(DAINet)method was proposed.Firstly,a Dual-stream Alternating Enhancement(DAE)module was constructed to fuse modal features in interactive dual-branch way.And by learning mapping relationships between modalities and employing bidirectional feedback adjustments of InFrared-VISible-InFrared(IR-VIS-IR)and VISible-InfRared-VISible(VIS-IR-VIS),the cross suppression of inter-modal noise was realized.Secondly,a Cross-Modal Feature Interaction(CMFI)module was constructed,and the residual structure was introduced to integrate low-level and high-level features within and between infrared-visible modalities,thereby minimizing differences and maximizing inter-modal feature utilization.Finally,on a self-constructed infrared-visible multi-modal typhoon dataset and a publicly available RGB-NIR multi-modal dataset,the effectiveness of DAE module and CMFI module was verified.Experimental results demonstrate that compared to the simple cascading fusion method on the self-constructed typhoon dataset,the proposed DAINet-based feature fusion method improves the overall classification accuracy by 6.61 and 3.93 percentage points for the infrared and visible modalities,respectively,with G-mean values increased by 6.24 and 2.48 percentage points,respectively.These results highlight the generalizability of the proposed method for class-imbalanced classification tasks.On the RGB-NIR dataset,the proposed method achieves the overall classification accuracy improvements of 13.47 and 13.90 percentage points,respectively,for the two test modalities.At the same time,experimental results of comparing with IFCNN(general Image Fusion framework based on Convolutional Neural Network)and DenseFuse methods demonstrate that the proposed method improves the o

关 键 词:跨模态 深度学习 图像分类 特征学习 双流网络 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象