红外与可见光图像交互自注意力融合方法  被引量:2

Infrared and Visible Image Fusion Method via Interactive Self-attention

在线阅读下载全文

作  者:杨帆 王志社[1] 孙婧 余朝发 YANG Fan;WANG Zhishe;SUN Jing;YU Zhaofa(School of Applied Science,Taiyuan University of Science and Technology,Taiyuan 030024,China;Ordnance NCO Academy,Army Engineering University of PLA,Wuhan 430075,China)

机构地区:[1]太原科技大学应用科学学院,太原030024 [2]陆军工程大学军械士官学校,武汉430075

出  处:《光子学报》2024年第6期214-225,共12页Acta Photonica Sinica

基  金:山西省基础研究计划(No.202203021221144)。

摘  要:针对现有红外与可见光图像融合方法仅仅依靠局部或全局特征表示,缺乏跨模态特征交互而造成融合性能低的问题,提出一种交互自注意力融合方法,利用Transformer对卷积神经网络提取的局部特征进行全局依赖关系建模,达到结合局部与全局关系的目的,提高特征表征能力。同时,构建了跨模态注意力交互模型,允许不同空间和独立通道之间以交互方式进行特征传递,以实现特征局部到全局的映射,从而增强两类图像的补充特性。在TNO、M3FD和Roadscene数据集上进行主客观实验,结果表明,与其他7种先进的融合方法相比,该方法在融合性能、模型泛化和计算效率方面都具有明显的优势,验证了方法的有效性和优越性。The fusion of infrared and visible images aims to merge their complementary information to generate a fused output with better visual perception and scene understanding.The existing CNN-based methods typically employ convolutional operations to extract local features while failing to model the long-range relationships.On the contrary,the Transformer-based methods usually propose a self-attention mechanism to model the global dependencies,but lack the supplement of local information.More importantly,these methods often ignore the specialized interactive information learning of different modalities,which produces limited fusion performance.To address these issues,this paper introduces an infrared and visible image fusion via interactive self-attention,namely ISAFusion.First,we devise a collaborative learning scheme that seamlessly integrates CNN and Transformer.This approach leverages residual convolutional blocks to extract local features,which are then aggregated into the transformer to model the global features,thus enhancing its powerful feature representation abilities.Second,we construct a cross-modality interactive attention module,which is a cascade of Token-ViT and Channel-ViT.This module can model the long-range dependencies from token and channel dimensions in an interactive manner,and allow feature communication between spatial locations and independent channels.The generated global features markedly focus on the intrinsic characteristics of different modality images,which can effectively strengthen their complementary information to achieve better fusion performance.Finally,we end-to-end train the fusion network through a comprehensive objective function encompassing the structural similarity index measure SSIM loss,gradient loss,and intensity loss.This design can ensure the fusion model preserves similar structural information,valuable pixel intensity,and rich texture details from source images.To verify the effectiveness and superiority of the proposed method,we carry out experiments on the three dif

关 键 词:图像融合 自注意力机制 特征交互 深度学习 多模态图像 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象