红外与可见光图像多尺度Transformer融合方法被引量：4

Multi-scale Transformer Fusion Method for Infrared and Visible Images

作　　者：陈彦林王志社[1] 邵文禹杨帆孙婧 CHEN Yanlin;WANG Zhishe;SHAO Wenyu;YANG Fan;SUN Jing(School of Applied Science,Taiyuan University of Science and Technology,Taiyuan 030024,China)

机构地区：[1]太原科技大学应用科学学院,山西太原030024

出　　处：《红外技术》2023年第3期266-275,共10页Infrared Technology

基　　金：山西省基础研究计划资助项目(201901D111260);信息探测与处理山西省重点实验室开放基金(ISPT2020-4)。

摘　　要：目前主流的深度融合方法仅利用卷积运算来提取图像局部特征,但图像与卷积核之间的交互过程与内容无关,且不能有效建立特征长距离依赖关系,不可避免地造成图像上下文内容信息的丢失,限制了红外与可见光图像的融合性能。为此,本文提出了一种红外与可见光图像多尺度Transformer融合方法。以Swin Transformer为组件,架构了Conv Swin Transformer Block模块,利用卷积层增强图像全局特征的表征能力。构建了多尺度自注意力编码-解码网络,实现了图像全局特征提取与全局特征重构;设计了特征序列融合层,利用SoftMax操作计算特征序列的注意力权重系数,突出了源图像各自的显著特征,实现了端到端的红外与可见光图像融合。在TNO、Roadscene数据集上的实验结果表明,该方法在主观视觉描述和客观指标评价都优于其他典型的传统与深度学习融合方法。本方法结合自注意力机制,利用Transformer建立图像的长距离依赖关系,构建了图像全局特征融合模型,比其他深度学习融合方法具有更优的融合性能和更强的泛化能力。Mainstream fusion methods based on deep learning employ a convolutional operation to extract local image features;however,the interaction between an image and convolution kernel is content-independent,and the long-range dependency cannot be well modeled.Consequently,the loss of important contextual information may be unavoidable and further limit the fusion performance of infrared and visible images.To this end,we present a simple and effective fusion network for infrared and visible images,namely,the multiscale transformer fusion method(MsTFusion).We first designed a novel Conv Swin Transformer block to model long-range dependency.A convolutional layer was used to improve the representative ability of the global features.Subsequently,we constructed a multiscale self-attentional encoding-decoding network to extract and reconstruct global features without the help of local features.Moreover,we designed a learnable fusion layer for feature sequences that employed softmax operations to calculate the attention weight of the feature sequences and highlight the salient features of the source image.The proposed method is an end-to-end model that uses a fully attentional model to interact with image content and attention weights.We conducted a series of experiments on TNO and road scene datasets,and the experimental results demonstrated that the proposed MsTFusion transcended other methods in terms of subjective visual observations and objective indicator comparisons.By integrating the self-attention mechanism,our method built a fully attentional fusion model for infrared and visible image fusion and modeled the long-range dependency for global feature extraction and reconstruction to overcome the limitations of deep learning-based models.Compared with other state-of-the-art traditional and deep learning methods,MsTFusion achieved remarkable fusion performance with strong generalization ability and competitive computational efficiency.

关键词：图像融合 Swin Transformer 自注意力机制多尺度红外图像

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

红外与可见光图像多尺度Transformer融合方法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

红外与可见光图像多尺度Transformer融合方法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

红外与可见光图像多尺度Transformer融合方法被引量：4