基于深度与实例分割融合的单目3D目标检测方法  

Monocular 3D object detection method integrating depth and instance segmentation

在线阅读下载全文

作  者:孙逊[1] 冯睿锋 陈彦如[2] SUN Xun;FENG Ruifeng;CHEN Yanru(Line Station Design and Research Institute,China Railway Siyuan Survey and Design Group Company Limited,Wuhan Hubei 430063,China;College of Economics and Management,Southwest Jiaotong University,Chengdu Sichuan 610031,China)

机构地区:[1]中铁第四勘察设计院集团有限公司线路站场设计研究院,武汉430063 [2]西南交通大学经济管理学院,成都610031

出  处:《计算机应用》2024年第7期2208-2215,共8页journal of Computer Applications

基  金:国家自然科学基金资助项目(62173279)。

摘  要:针对单目3D目标检测在视角变化引起的物体大小变化以及物体遮挡等情况下效果不佳的问题,提出一种融合深度信息和实例分割掩码的新型单目3D目标检测方法。首先,通过深度-掩码注意力融合(DMAF)模块,将深度信息与实例分割掩码结合,以提供更准确的物体边界;其次,引入动态卷积,并利用DMAF模块得到的融合特征引导动态卷积核的生成,以处理不同尺度的物体;再次,在损失函数中引入2D-3D边界框一致性损失函数,调整预测的3D边界框与对应的2D检测框高度一致,以提高实例分割和3D目标检测任务的效果;最后,通过消融实验验证该方法的有效性,并在KITTI测试集上对该方法进行验证。实验结果表明,与仅使用深度估计图和实例分割掩码的方法相比,在中等难度下对车辆类别检测的平均精度提高了6.36个百分点,且3D目标检测和鸟瞰图目标检测任务的效果均优于D4LCN(Depth-guided Dynamic-Depthwise-Dilated Local Convolutional Network)、M3D-RPN(Monocular 3D Region Proposal Network)等对比方法。To address the limitations of monocular 3D object detection,when encountering changing object size due to changing perspective and occlusion,a new monocular 3D object detection method was proposed fusing depth information with instance segmentation masks.Firstly,with the help of the Depth-Mask Attention Fusion(DMAF)module,depth information was combined with instance segmentation masks to provide more accurate object boundaries.Secondly,dynamic convolution was introduced,and the fused features obtained from the DMAF module were used to guide the generation of dynamic convolution kernels for dealing with objects of different scales.Moreover,a 2D-3D bounding box consistency loss function was introduced into loss function,adjusting the predicted 3D bounding box to highly coincide with corresponding 2D detection box,thereby enhancing performance in instance segmentation and 3D object detection tasks.Lastly,the effectiveness of the proposed method was confirmed through ablation studies and validated on the KITTI test set.The results indicate that,compared to methods using only depth estimation maps and instance segmentation masks,the proposed method improves the average accuracy of vehicle detection under medium difficulty by 6.36 percentage points,and it outperforms comparative techniques like D4LCN(Depth-guided Dynamic-Depthwise-Dilated Local Convolutional Network)and M3D-RPN(Monocular 3D Region Proposal Network)in both 3D object detection and aerial view object detection tasks.

关 键 词:单目3D目标检测 深度学习 动态卷积 实例分割 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象