结合深度信息引导和多尺度通道注意力机制的单目三维目标检测算法  

Monocular 3D object detection algorithm combining depth guidance and multi-scale channel attention mechanism

在线阅读下载全文

作  者:刘青 李伟 余少勇 宋宇萍 周启迪 邹伟林 LIU Qing;LI Wei;YU Shaoyong;SONG Yuping;ZHOU Qidi;ZOU Weilin(School of Computer and Information Engineering,Xiamen University of Technology,Xiamen 361024,Fujian,China;School of Mathematics and Information Engineering,Longyan University,Longyan 364012,Fujian,China;School of Mathematical Sciences,Xiamen University,Xiamen 361005,Fujian,China)

机构地区:[1]厦门理工学院计算机与信息工程学院,福建厦门361024 [2]龙岩学院数学与信息工程学院,福建龙岩364012 [3]厦门大学数学科学学院,福建厦门361005

出  处:《山东大学学报(理学版)》2025年第1期63-73,82,共12页Journal of Shandong University(Natural Science)

基  金:教育部人文社会科学研究规划基金资助项目(23YJAZH067);国家留学基金资助项目(202308350042);厦门市科学技术局产学研资助项目(2023CXY0409);厦门理工学院研究生教育教学改革研究资助项目(YJS20220617)。

摘  要:针对三维边界框无法从缺少空间线索的单目图像中准确估计的问题,本文提出一种基于深度信息引导和多尺度通道注意力机制的单目三维目标检测算法。为了引入三维信息并有效地获取和利用不同尺度特征图的空间信息,在特征提取模块中利用多尺度分割注意力算法,分别从单目图像和深度图中提取多尺度预处理特征图,利用通道注意力算法进行权重标定,提高了特征图的表征能力。通过深度引导动态局部卷积网络,将包含多尺度信息的深度图特征作为单目图像特征的特定卷积核,引入三维信息作为指导,减少直接融合的误差累积,并解决单目视觉中近大远小的尺度敏感问题。选择不同的评估指标对模型的性能进行评价与比较。实验结果表明,同其他算法相比,本文算法的自动驾驶数据集中汽车、行人、骑自行车的人的三维目标检测平均精度均提高。For issues where the absence of essential spatial structure signals makes it highly challenging to estimate 3D bounding boxes accurately from a single picture,a monocular 3D object detection algorithm is proposed based on a multi-scale channel attention mechanism plus depth guidance to conquer these challenges.To introduce 3D data and effectively capture spatial information from different scales of feature maps,the depth maps and monocular image feature maps are pre-processed in the feature extraction module using a pyramid split algorithm,respectively,and then on the basic of the weight using the channel-wise attention module to calibrate the corresponding feature vectors to generate a refined feature map which is richer in multi-scale feature information.A depth-guided dynamic local convolution network is suggested for applying depth maps as specific kernels that contain spatial structure signals to monocular image feature maps.This method mitigates error accumulation from direct fusion and addresses the scale sensitivity issue of objects looking larger or smaller with distance.The model̓s performance is assessed and also compared using various evaluation metrics.Experimental results demonstrate that the method proposed in this paper improves the 3D detection accuracy for cars,pedestrians and cyclists in the autonomous driving datasets when compared to other algorithms.

关 键 词:单目三维目标检测 深度引导 多尺度通道注意力机制 自动驾驶 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象