融合空间语义的自动驾驶视觉联合感知算法  

Spatial Semantic Fusion Network for Autonomous Driving Visual Joint Perception Algorithm

作  者:王越 曹家乐 孙学斌 王建 庞彦伟 WANG Yue;CAO Jiale;SUN Xuebin;WANG Jian;PANG Yanwei(School of Electrical and Information Engineering,Tianjin University,Tianjin,China)

机构地区:[1]天津大学电气自动化与信息工程学院,天津

出  处:《太原理工大学学报》2025年第2期338-347,共10页Journal of Taiyuan University of Technology

基  金:科技创新2030-新一代人工智能重大项目(2022ZD0160400);国家自然科学基金资助项目(C0049120)。

摘  要:【目的】作为自动驾驶关键部分,视觉联合感知系统可完成自动驾驶场景中的目标检测、可行驶区域分割及车道线检测等多项任务,在实际应用中需实现精度与速率的合理权衡。自动驾驶视觉联合感知框架YOLOP在实时性方面取得了优异表现,但存在特征金字塔不同尺度间的特征冲突及下采样过程的纹理细节损失问题。为缓解这些问题,提出一种融合空间语义的自动驾驶视觉联合感知算法,以空间语义嵌入和融合为核心,从特征增强及特征融合两方面改进YOLOP原有语义融合网络。【方法】在特征增强方面,提出双向注意力信息增强模块,减少多尺度特征图生成过程中的空间信息损失,从水平和垂直两个正交维度对全局上下文先验及对应精确位置信息建模,将通道注意力语义信息嵌入至空间细节,有效突出关键区域,提升特征图纹理细节表征能力;在特征融合方面,设计多分支级联特征融合模块,缓解各层级特征对应空间位置的相互干扰,采用不同扩张率空洞卷积与指数加权池化增大感受野范围,级联融合空间上下文语义信息,利用动态卷积对多尺度场景特征进行自适应交互聚合,实现纹理细节与高层语义的信息互补。此外,针对模型中各子任务训练不均衡问题,引入自适应参数对损失函数加权系数进行改进,有效提升网络检测和分割性能。【结果】在BDD100K数据集的实验表明,相比于YOLOP,所构建自动驾驶视觉联合感知模型保证了网络推理实时性,在车道线检测及目标检测平均精度方面分别提升了8.9%和1.6%。【Purposes】The visual joint perception system can fulfill multi-tasks such as traffic object detection,drivable area segmentation,and lane detection in autonomous driving traffic scenes,which is essential in autonomous driving.In practical application,the accuracy and speed should be appropriately balanced.The autonomous driving visual joint perception network YOLOP has achieved great performance in real-time.However,it ignores the feature conflicts of different scales in the feature pyramid network and the texture details loss in the downsampling process.To relieve these problems,the spatial semantic fusion network for autonomous driving visual joint perception(SSFJP)algorithm is proposed.In this paper,the original semantic fusion network of YOLOP is modified from two aspects,focusing on spatial semantic embedding and fusion.【Methods】Regarding feature enhancement,the bidirectional attention information strength module(BAISM)was adopted for global contextual prior and corresponding precise positional information modeling from horizontal and vertical dimensions,which helps embed channel attention semantic information into spatial details,effectively highlight the critical visual area,and improve the representation ability of features’texture details.In terms of feature fusion,the multi-branch cascade feature fusion(MCFF)used atrous convolution with different rates and exponentially weighted pooling to fuse scene feature information of different scales,cascade fusion of spatial context semantic information,relieve the mutual interference of features corresponding to spatial positions of different levels,and achieve complementary information between texture details and high-level semantics.Besides,adaptive parameters were introduced to design weighting coefficients of the loss function to solve the imbalanced training of different sub-tasks,effectively improving the detection and segmentation performance.【Findings】Experiments on the BDD100K dataset show that the proposed autonomous multi-task driving jo

关 键 词:自动驾驶 视觉联合感知 语义融合 双向注意力信息增强 多任务 多尺度 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象