基于梯度算子和注意力的多模态融合目标检测

Multi-modal fusion object detection based on gradient operator and attention

作　　者：李学钊王伟[1] 薛冰 Li Xuezhao;Wang Wei;Xue Bing(School of Intelligent Science and Engineering,Harbin Engineering University,Harbin 150000,China)

机构地区：[1]哈尔滨工程大学智能科学与工程学院,哈尔滨150000

出　　处：《仪器仪表学报》2024年第11期224-232,共9页Chinese Journal of Scientific Instrument

基　　金：江淮前沿技术协同创新中心追梦基金课题(2023ZM01Z025)项目资助。

摘　　要：红外与可见光图像具有很好的互补特性,可以利用这2种模态图像的融合来适应自动驾驶等领域对于目标检测高精度和高鲁棒性的要求。现有多模态目标检测算法往往模型庞大,推理耗时长,无法在边缘设备上部署,而采用直接融合等方法又无法充分发挥不同模态的优势,因此提出了一种基于梯度算子和注意力机制的融合目标检测算法。引入梯度算子设计定制化卷积来捕获图像纹理;红外支路引入坐标注意力发挥其目标定位优势;引入权重生成网络对2个模态的特征进行自适应加权融合。算法结构模块化,轻量化,适合部署在边缘设备上。在数据集上实验,得到mAP@0.50和mAP@0.5∶0.95指标值比可见光单模态检测提升了6.3%和7.2%,比红外提升了11.3%和9.8%。推理帧率可达22.7,满足实时性要求。Infrared and visible images exhibit complementary characteristics,making their fusion highly suitable for achieving high accuracy and robustness in target detection for applications such as autonomous driving.However,existing multimodal object detection algorithms often feature large models and long inference times,making them unsuitable for deployment on edge devices.Additionally,direct fusion methods fail to fully leverage the strengths of different modalities.To address these challenges,we propose a fusion object detection algorithm that integrates a gradient operator and an attention mechanism.A gradient operator is employed to design a customized convolutional layer for capturing image texture.In the infrared branch,coordinate attention is incorporated to enhance target localization capabilities.Additionally,a weight generation network is introduced to adaptively balance the features of both modalities.The algorithm is modular and lightweight,making it ideal for edge device deployment.Experiments on benchmark datasets demonstrate that the proposed method achieves mAP@0.50 and mAP@0.5∶0.95 scores that are 6.3%and 7.2%higher,respectively,than single-modal detection using visible images,and 11.3%and 9.8%higher than infrared detection.The inference frame rate reaches 22.7 FPS,meeting real-time processing requirements.Infrared and visible images have good complementary characteristics,and the fusion of these two modal images can be used to meet the requirements of high accuracy and high robustness of target detection in automatic driving and other fields.The existing multimodal object detection algorithms often have large models and long reasoning time,which cannot be deployed on edge devices,and the direct fusion method cannot give full play to the advantages of different modalities.Therefore,we propose a fusion object detection algorithm based on gradient operator and attention mechanism.The gradient operator was introduced to design a customized convolution to capture the image texture.The infrared branch in

关键词：目标检测双模态特征融合梯度算子注意力机制

分类号：TH741[机械工程—光学工程] TP391.41[机械工程—仪器科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于梯度算子和注意力的多模态融合目标检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于梯度算子和注意力的多模态融合目标检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索