检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:叶志晖 武健 赵晓忠[1] 王文娟[1] 邵新光 YE Zhihui;WU Jian;ZHAO Xiaozhong;WANG Wenjuan;SHAO Xinguang(China Tobacco Zhejiang Industrial Co.LTD.,Hangzhou 310008,China;Polytechnic Instiute,Zhejiang University,Hangzhou 310058,China)
机构地区:[1]浙江中烟工业有限责任公司,浙江杭州310008 [2]浙江大学工程师学院,浙江杭州310058
出 处:《红外技术》2025年第4期468-474,共7页Infrared Technology
基 金:国家自然科学基金(62002320)。
摘 要:为提升目标检测方法在复杂场景下的检测效果,将深度学习算法与多模态信息融合技术相结合,提出了一种基于特征交互与自适应分组融合的多模态目标检测模型。模型采用红外和可见光目标图像为输入,以PP-LCNet网络为基础构建对称双支路特征提取结构,并引入特征交互模块,保证不同模态目标特征在提取过程中的信息互补;其次,设计二值化分组注意力机制,利用全局池化结合Sign函数将交互模块的输出特征以所属目标类别进行特征分组,再分别采用空间注意力机制增强各特征组中的目标信息;最后,基于分组增强后的特征,提取不同尺度下的同类特征组,通过自适应加权方式由深至浅进行多尺度融合,并根据融合后的各尺度特征实现目标预测。实验结果表明,所提方法在多模态特征交互、关键特征增强以及多尺度融合方面都有较大的提升作用,并且在复杂场景下,模型也具有更高的鲁棒性,可以更好地适用于不同场景中。To improve the performance of object detection methods in complex scenes,a multimodal object detection model based on feature interaction and adaptive grouping fusion is proposed by combining deep learning algorithms with multimodal information fusion technology.The model uses infrared and visible object images as inputs,constructs a symmetrical dual-branch feature extraction structure based on the PP-LCNet network,and introduces a feature interaction module to ensure complementary information between different modal object features during the extraction process.Secondly,a binary grouping attention mechanism was designed.Global pooling combined with the sign function was used to group the output features of the interaction module into their respective object categories,and spatial attention mechanisms were used to enhance the object information in each group of features.Finally,based on the group-enhanced features,similar feature groups at different scales were extracted,and multi-scale fusion was carried out through adaptive weighting from deep to shallow.Object prediction was then achieved based on the fused features at each scale.The experimental results show that the proposed method significantly improves multimodal feature interaction,key feature enhancement,and multi-scale fusion.Moreover,in complex scenarios,the model exhibits higher robustness and can be better applied to different scenarios.
关 键 词:多模态 目标检测 特征交互 二值化分组 自适应融合
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171