机构地区:[1]北京信息科技大学仪器科学与光电工程学院,北京100192
出 处:《红外与激光工程》2024年第11期322-336,共15页Infrared and Laser Engineering
基 金:北京市自然科学基金项目(青年项目)(4244105)。
摘 要:提出了一种创新的三重多模态红外和可见图像融合算法,以解决传统卷积运算在全局特征捕捉和长程相关性分析方面的不足。该算法的核心创新包括:首先,在输入端引入差分图像,通过像素值相减突出图像间差异,构建三重输入网络架构,增强图像特征的区分度。其次,设计了混合差分卷积(Mixed difference convolution,MDconv),一种传统卷积的变体,结合边缘检测算子,利用像素差分原理,提升卷积运算的特征学习能力;进一步地,采用双分支编码器结构,结合密集混合差分卷积的卷积神经网络分支和高效视觉Transformer(Efficient Vision Trasnsformer,EfficientViT)分支,分别提取图像的局部细节和全局背景,实现对局部与全局特征的全面捕捉;最后,采用多维坐标协同注意力融合策略,在融合层有效整合编码器输出的多模态图像特征。在公开数据集上的定性和定量实验表明,采用文中算法进行红外和可见融合后图像具有背景纹理细节清晰、热辐射目标更显著等明显优势,并在四项客观评价指标MI、VIF、SD、QAB/F分别达到最优值,在SF指标上取得次优值。消融实验也证明了文中所提各个模块的有效性。Objective By leveraging the complementarity between infrared and visible light images,infrared and visible light image fusion technology integrates images obtained from different sensors in the same scene into a fused image that is rich in information,highly reliable,and specifically targeted,providing a comprehensive description and integration of the image information in the scene.The fused image retains both the thermal radiation targets of the infrared image and the detailed texture information of the visible light image.However,existing deep learning-based fusion methods all use convolutional neural networks as the basic framework,such as the encoder structure in the autoencoder method and the generator and discriminator in the generative adversarial network,which all use a large number of stacked convolutional layers to process the input image features.Traditional convolution operations,due to the limitations of the size of the convolution kernel and the scope of its effect,have very limited capabilities in extracting image features,focusing only on the local features of the image,such as the local edges of the thermal radiation target areas in infrared images.They cannot well preserve the global features of the image,including the rich texture background information in visible light images and the contour information of objects or environments in the scene.This one-sidedness of feature extraction leads to blurred background details in the fused image and insufficiently prominent thermal radiation targets.Therefore,there is an urgent need to propose a multimodal fusion method that can extract both global and local features to remedy the aforementioned deficiencies.Methods A triple multimodal image fusion algorithm based on mixed difference convolution and efficient visual Transformer networks is proposed.The core innovations of this algorithm include:Firstly,at the input end,differential images are introduced to highlight the differences between images through pixel value subtraction,constructing a triple-i
关 键 词:差分卷积 高效视觉Transformer 注意力机制 图像融合 红外与可见光图像
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...