结合视锥变换和RGB体素图的半监督三维目标检测

Semi-supervised 3D object detection based on frustum transformationand RGB voxel grid

作　　者：汪岩袁甜甜[1] 胡彬[1,2] 李尧 WANG Yan;YUAN Tiantian;HU Bin;LI Yao(Technical College for the Deaf,Tianjin University of Technology,Tianjin 300384,China;School of Microelectronics,Tianjin University,Tianjin 300072,China)

机构地区：[1]天津理工大学聋人工学院,天津300384 [2]天津大学微电子学院,天津300072

出　　处：《红外与激光工程》2024年第8期250-261,共12页Infrared and Laser Engineering

基　　金：天津市成像与感知微电子技术重点实验室开放基金项目(H20230328)。

摘　　要：基于LiDAR、可见光等多模态传感器的高精度三维目标检测是自动驾驶领域的关键技术。为了提高目标检测的精度和方位感知能力,降低模型对于标注数据的依赖,结合视锥变换方法优化了三维点云方向特征提取策略,提出了一种基于视锥变换和半监督学习架构的三维目标检测技术。具体而言,基于通道注意力模块优化视锥体对远距离目标的感知能力,提出了RGB体素模块提升遮挡目标的识别精度。首先通过深度网络从RGB图像中提取纹理信息,将其与激光雷达的距离信息融合,以保持三维空间特征的完整性。其次,通过特征融合模块提取体素空间特征的权重。最后,采用自适应伪标签方法降低对标注样本的依赖,并基于群体投票方法进一步降低误报率。实验结果表明,该方法在KITTI数据集上取得了令人满意的成果,行人和车辆目标检测的准确率分别达到了56.30%和75.88%。该研究为未来在复杂的场景中实现高效的三维目标检测提供了思路,并为进一步优化自动驾驶的多模态数据融合技术奠定了基础。Objective In the field of autonomous driving,high-precision object detection is crucial for ensuring safety and efficiency.A common approach is to use voxel-based methods,which are susceptible to the quantization grid size.Smaller grid sizes make the algorithm more computationally intensive,while larger grid sizes increase quantization loss,leading to the loss of precise position information and fine details.Successive convolution and down-sampling operations can further weaken the precise localization signals in the point cloud.To improve the orientation perception and accuracy of object detection,we propose a frustum transform-based method that uses RGB images to extract features and fuses them with distance information from LiDAR.This approach optimizes the strategy for extracting orientation features from the 3D point cloud.To reduce the model's dependence on annotated data,we also design a semi-supervised learning architecture that employs an adaptive pseudo-labeling method,thereby further reducing the false alarm rate of the group voting-based method.Methods We propose a LiDAR-RGB fusion network based on the frustum transform(Fig.1).Specifically,texture information is extracted from the RGB image by a deep network and fused with distance information from the LiDAR to maintain the integrity of the 3D spatial features(Fig.2).Subsequently,the weights of the voxel spatial features are optimized using the channel attention module(Fig.3).Finally,a semi-supervised learning architecture(Fig.4)is employed to reduce the false alarm rate by utilizing the spatial feature fusion module(Fig.5)and the group-based voting module.The comparative learning module is used to improve the reliability of the detection.Results and Discussions The proposed method was evaluated on the KITTI dataset(Tab.1).Our method achieved 56.30%accuracy in pedestrian detection and 75.88%accuracy in vehicle detection,with a detection rate of 21 FPS.In the ablation study of the LRFN(LiDAR-RGB Fusion Network)model(Tab.2),the RVFM(RGB Voxel Feature Mo

关键词：三维目标检测 RGB体素特征视锥变换半监督学习 KITTI数据集

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合视锥变换和RGB体素图的半监督三维目标检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合视锥变换和RGB体素图的半监督三维目标检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索