基于自适应加权融合激光雷达和相机的三维目标检测方法  

3D object detection method based on adaptive weighted fusion of lidar and camera

在线阅读下载全文

作  者:董钰婷 官磊 DONG Yuting;GUAN Lei(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院成都计算机应用研究所,成都610041 [2]中国科学院大学计算机科学与技术学院,北京100049

出  处:《计算机应用》2024年第S01期250-255,共6页journal of Computer Applications

摘  要:基于激光雷达和相机融合的三维目标检测技术广泛应用于自动驾驶领域,然而大部分融合方法只是简单组合不同传感器,忽略了不同传感器的感知能力随环境的变化而变化的问题,从而难以精确检测出行人等分辨率较低的目标。针对这一问题,提出一种基于自适应加权融合激光雷达和相机的三维目标检测方法。首先,使用ResNet50+RPN主干网络提取图像的多尺度语义特征,同时,使用动态体素特征编码器将原始点云数据聚合成点云特征;其次,利用自注意力和交叉注意力融合语义特征和点云特征,自适应地为两者的特征图分配权重;最后,将融合后的点特征通过单阶段检测器SECOND(Sparsely Embedded CONvolutional Detection)进行目标边界框回归和分类预测,并且将检测结果在KITTI数据集上验证。实验结果表明,在简单、中等和困难三种难度级别下,该多模态融合方法相较于原始SECOND模型,对汽车和行人的检测精度均有较大的提升,其中行人的检测精度提升最明显;同时,与许多主流的三维目标检测网络相比,所提方法具有更高的精度。3D object detection technology based on lidar and camera fusion is widely used in autonomous driving.However,most fusion methods just simply combine different sensors,ignoring the problem that the perception capabilities of different sensors change with the environment,making it difficult to accurately detect the location of low-resolution objects such as pedestrians.To address this problem,a 3D object detection method employing adaptive weighted fusion of lidar and camera data was introduced.Firstly,the multi-scale semantic features of the image were extracted by ResNet50+RPN(Region Proposal Network)backbone network;at the same time,the raw point cloud data were aggregated into point cloud features by the dynamic voxel feature encoder.Then,the semantic features and point cloud features were fused by self-attention and cross-attention,and the semantic and point feature maps were adaptively allocated weights.Finally,the fusion point features were passed through the single-stage detector SECOND(Sparsely Embedded CONvolutional Detection)to realize bounding box regression and classification prediction.The obtained detection outcomes were verified on the KITTI dataset.Experimental results show that,under the three difficulty levels of simple,medium and difficult,compared with the original SECOND model,the detection precision of the multi-modal fusion method is greatly improved for cars and pedestrians,especially for pedestrians,and the proposed method has higher detection precision compared to the mainstream 3D object detection networks.

关 键 词:激光雷达 多模态融合 三维目标检测 注意力机制 自动驾驶 深度学习 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象