基于双融合框架的多模态3D目标检测算法  被引量:4

A Multimodal 3D Object Detection Method Based on Double-Fusion Framework

在线阅读下载全文

作  者:葛同澳 李辉[1] 郭颖 王俊印 周迪 GE Tong-ao;LI Hui;GUO Ying;WANG Jun-yin;ZHOU Di(School of Data Science,Qingdao University of Science and Technology,Qingdao,Shandong 266000,China;School of Computer Science and Artificial Intelligence,Wuhan University of Technology,Wuhan,Hubei 430000,China)

机构地区:[1]青岛科技大学数据科学学院,山东青岛266000 [2]武汉理工大学计算机与人工智能学院,湖北武汉430000

出  处:《电子学报》2023年第11期3100-3110,共11页Acta Electronica Sinica

基  金:中国高校产学研创新基金(No.2021ITA05047);国家自然科学基金(No.62002190);山东省高等学校青创科技支持计划(No.2019KJN047)。

摘  要:相机和激光雷达多模态融合的3D目标检测可以综合利用两种传感器的优点,提高目标检测的准确度和鲁棒性.然而,由于环境复杂性以及多模态数据间固有的差异性,3D目标检测仍面临着诸多挑战.本文提出了双融合框架的多模态3D目标检测算法.设计体素级和网格级的双融合框架,有效缓解融合时不同模态数据之间的语义差异;提出ABFF(Adaptive Bird-eye-view Features Fusion)模块,增强算法对小目标特征感知能力;通过体素级全局融合信息指导网格级局部融合,提出基于Transformer的多模态网格特征编码器,充分提取3D检测场景中更丰富的上下文信息,并提升算法运行效率.在KITTI标准数据集上的实验结果表明,提出的3D目标检测算法平均检测精度达78.79%,具有更好的3D目标检测性能.The 3D object detection of camera and lidar multimodal fusion can comprehensively utilize the advantages of the two sensors to improve the accuracy and robustness of detection.However,due to the complexity of the environment and the inherent variability among multimodal data,3D object detection still faces many challenges.In this paper,we propose a multimodal 3D object detection algorithm with a double-fusion framework.We design a voxel-level and grid-level double-fusion framework,effectively alleviating the semantic differences between modal data.We propose the ABFF(Adaptive Bird-eye-view Features Fusion)module to enhance the algorithm's ability to perceive small object features.Through voxel-level global fusion information to guide grid-level local fusion,we propose a Transformer-based multimodal grid feature encoder to extract richer context information in 3D detection scenes and improve the efficiency of the algorithm.The experimental results on the KITTI standard dataset show that the average detection accuracy of our proposed 3D object detection algorithm reaches 78.79%,which has better 3D object detection performance.

关 键 词:深度学习 三维目标检测 激光雷达 相机 多模态信息融合 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象