基于球语义多模态融合的三维目标检测  

3D Object Detection Based on Spherical Semantic Multi-modal Fusion

在线阅读下载全文

作  者:韩路宇 林珊玲 赵民 林志贤[1,2,3] 郭太良 HAN Luyu;LIN Shanling;ZHAO Min;LIN Zhixian;GUO Tailiang(College of Physics and Information Engineering,Fuzhou University,Fuzhou 350116,CHN;Fujian Science&Technology Innovation Laboratory for Optoelectronic Information of China,Fuzhou 350116,CHN;School of Advanced Manufacturing,Fuzhou University,Quanzhou Fujian 362200,CHN)

机构地区:[1]福州大学物理与信息工程学院,福州350116 [2]中国福建光电信息科学与技术创新实验室,福州350116 [3]福州大学先进制造学院,福建泉州362200

出  处:《光电子技术》2025年第1期75-81,共7页Optoelectronic Technology

基  金:国家重点研发项目(2021YFB3600603);福建省自然科学基金项目(2020J01468)。

摘  要:针对当前三维目标检测由于数据增强导致点云和图像无法有效对齐,点与点对齐方法会丢失图像特征以及定位和分类置信度不一致的问题,提出一种多模态融合的三维目标检测方法。首先,采用PointNet++提取点云的特征;采用卷积神经网络提取图像特征;其次,在点云与图像融合阶段,采用语义对齐方法以及图像球特征,实现点云与图像更好的跨模态对齐。同时采用基于注意力的方法来指导点云与图像特征的融合,以获取更可靠的图像特征;最后引入DIoU损失来平衡置信度不一致的问题。实验结果表明:所采用的方法明显优于baseline,在简单、中等和困难任务下,Car类别的mAP达85.6%。In response to the current challenges in 3D object detection,where data augmentation has led to ineffective alignment between point clouds and images,causing issues such as loss of image features and inconsistent confidence in localization and classification,a multi-modal fusion approach for 3D object detection was introduced.Firstly,PointNet++was employed to extract features from point clouds,while a convolutional neural network was used to extract image features.Subsequently,during the point cloud and image fusion phase,semantic alignment methods and image sphere features were utilized to achieve improved cross-modal alignment between point clouds and images.Simultaneously,an attention-based approach was employed to guide the fusion of point cloud and image features for obtaining more reliable image features.Finally,the DIoU(Distance Intersection over Union)loss was introduced to address the issue of inconsistent confidence.Experimental results demonstrated that the proposed method could significantly outperform the baseline.For the Car category,the mAP(mean Average Precision)could reach 85.6%across simple,medium,and challenging tasks.

关 键 词:激光雷达 彩色图像 多模态融合 自动驾驶 

分 类 号:TN957[电子电信—信号与信息处理] TP394.1[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象