基于参数描述的换道场景自动驾驶精确决策学习被引量：4

Precise Decision-Making Learning for Automated Vehicles in Lane-Change Scenario Based on Parameter Description

作　　者：张羽翔[1] 何钢磊李鑫[1] 刘奇芳[1] 丛岩峰[1] 王玉海[1] ZHANG Yuxiang;HE Ganglei;LI Xin;LIU Qifang;CONG Yanfeng;WANG Yuhai(State Key Laboratory of Automotive Simulation and Control,Jilin University,Changchun 130022,China)

机构地区：[1]吉林大学汽车仿真与控制国家重点实验室,长春130022

出　　处：《同济大学学报（自然科学版）》2021年第S01期132-140,共9页Journal of Tongji University:Natural Science

基　　金：国家自然科学基金青年基金(61803173);吉林省中青年科技创新领军人才及团队项目(20200301011RQ)。

摘　　要：为提高车辆驾驶安全性并充分考虑人类驾驶员对于自动驾驶控制系统的接受度,研究并实现了自动驾驶车辆在换道场景下的精确决策学习。汽车自动驾驶不仅需要决策是否换道,还需要决定汽车的具体微观行为,如换道时间和期望加速度的确定等,因此,车道变换的精确决策需使用3个参数来描述,并需要通过强化学习求解。这种基于参数精确决策的合理性体现在两个方面:首先是不同的决策参数值会影响规划的轨迹,如果决策不精确,将产生运动的不确定性;其次是基于真实交通数据(NGSIM)的分析,因为人类换道行为在换道时间和期望加速度上存在显著的差异性,在当前的决策研究中很少被明确考虑。此外,发现NGSIM数据中存在一些潜在的紧急情况,可以通过优化部分决策参数来提升其安全性;在强化学习算法的设计中,动作过程中加入换道时间和期望加速度;奖励函数考虑了安全性、当前驾驶员的意愿和平均人类驾驶风格;问题求解中,自定义了基函数,并通过基于核函数的最小二乘策略迭代强化学习方法学习精确的安全决策行为。仿真结果表明,使用强化学习参数决策可以实现更精确的决策,从而提高安全性能,并可在变道场景中模仿人类驾驶员的行为。To promote safety and fully consider human drivers'acceptance,precise decision-making is realized for automated vehicles under the lane-change scenario in this paper.More specifically,automated vehicles not only decide to change lanes or not but also decide specific microcosmic behaviors,such as lane-change time and expected acceleration.Thus,precise decisions for lane-change are described with three parameters and learned by reinforcement learning.The rationality of such parameter-based precise decisions is shown in two aspects.First,different values of decision parameters will notably influence the planned trajectory,which means other microcosmic behaviors will be a significant uncertainty when they are not precisely decided in the decision-making layer.Secondly,based on the analysis of real traffic data,NGSIM,changeable lane-change time,and expected acceleration are revealed in lane-change behaviors,which is seldom explicitly considered in the decision-making layer of current researches.The decision parameters that include lane-change time and expected acceleration are learned with kernel-based least-squares policy iteration reinforcement learning(KLSPI).Safety,current driver's willingness,and average human driving style are considered in the reward function.Simulation results demonstrate that using reinforcement learning(RL)to learn decision parameters can realize more precise decisions,promote safety performance,and imitate human drivers'behaviors in the lane-change scenario.

关键词：自动驾驶车辆驾驶决策真实交通数据换道场景

分类号：U471.1[机械工程—车辆工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于参数描述的换道场景自动驾驶精确决策学习被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于参数描述的换道场景自动驾驶精确决策学习 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于参数描述的换道场景自动驾驶精确决策学习被引量：4