一种多感知多约束奖励机制的驾驶策略学习方法被引量：5

A driving decision-making approach based on multi-sensing and multi-constraints reward function

作　　者：王忠立[1] 王浩申艳[1] 蔡伯根[1] Zhong-li WANG;Hao WANG;Yan SHEN;Bai-gen CAI(School of Electronic of Information Engineering,Beijing Jiaotong University,Beijing 100044,China)

机构地区：[1]北京交通大学电子信息工程学院,北京100044

出　　处：《吉林大学学报（工学版）》2022年第11期2718-2727,共10页Journal of Jilin University:Engineering and Technology Edition

基　　金：科技创新2030重大项目(2022ZD0205000);国家科学自然基金面上项目(61573057,61702032)。

摘　　要：针对交通场景的复杂性和多变性,深度学习算法和深度强化学习方法适应性较差的问题,本文提出一种基于多感知输入多约束奖励函数的深度强化学习方法。方法的输入包括前视图像和激光雷达数据和鸟瞰图信息,多种输入信息经过编码网络得到潜在空间表示,经过重构后作为驾驶策略学习的输入,并在奖励函数的设计中综合考虑了横纵向误差、航向、平稳性、速度等多种约束,从而有效提高了场景的适应能力和策略学习的收敛速度。在仿真环境CARLA下搭建了典型的交通场景对方法的性能进行了仿真验证,并对多约束奖励机制进行了分析对比。结果表明:本文方法能实现车辆在多场景下的驾驶决策,性能明显优于同类SOTA方法。Due to the complicated and volatile traffic scenes,deep learning-based approaches and most of the deep reinforcement learning approaches cannot satisfy the requirements of real applications. To address these issues,a reinforcement learning-based approach based on multi-sensing and multi-constraint reward function under SAC framework(MSMC-SAC)is proposed. The inputs of the method include front images and LiDAR data,as well as the bird’s-eye view information generated from the perception results. The multiple information input is coded by an encoding network to obtain the representation in latent space,and the reconstructed information is used as the input for reinforcement learning module,and a reward function considering various constraints such as transverse-longitudinal error,heading,smoothness,and driving speed is designed. The performance of the proposed method in some typical traffic scenarios is simulated and verified with CARLA. The multi-constraint reward mechanism is analyzed. The simulation results show that the presented approach can generate the driving policies in many traffic scenarios,and the performance is outperformed against the existing SOTA methods.

关键词：车辆工程深度强化学习驾驶策略多奖励函数

分类号：U469.79[机械工程—车辆工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种多感知多约束奖励机制的驾驶策略学习方法被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种多感知多约束奖励机制的驾驶策略学习方法 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种多感知多约束奖励机制的驾驶策略学习方法被引量：5