TD3算法改进与自动驾驶汽车并道策略学习被引量：4

TD3 Algorithm Improving and Lane-merging Strategy Learning for Autonomous Vehicles

作　　者：张志勇[1,2] 黄大洋黄彩霞[1,3] 胡林杜荣华[2] ZHANG Zhiyong;HUANG Dayang;HUANG Caixia;HU Lin;DU Ronghua(Hunan Province Key Laboratory of Intelligent Manufacturing Technology for High-performance Mechanical Equipment,Changsha University of Science and Technology,Changsha 410114;College of Automobile and Mechanical Engineering,Changsha University of Science and Technology,Changsha 410114;Hunan Provincial Key Laboratory of Automotive Power and Transmission System,Hunan Institute of Technology,Xiangtan 411104)

机构地区：[1]长沙理工大学机械装备高性能智能制造关键技术湖南省重点实验室,长沙410114 [2]长沙理工大学汽车与机械工程学院,长沙410114 [3]湖南工程学院汽车动力与传动系统湖南省重点实验室,湘潭411104

出　　处：《机械工程学报》2023年第8期224-234,共11页Journal of Mechanical Engineering

基　　金：国家自然科学基金(61973047);湖南省自然科学基金(2021JJ30182,2022JJ50020);湖南省教育厅科学研究(20A018);机械装备高性能智能制造关键技术湖南省重点实验室(长沙理工大学)开放基金(2020YB02)资助项目。

摘　　要：为提高自动并道策略的综合性能,改进了双延迟深度确定性策略梯度算法(Twin delayed deep deterministic policy gradient,TD3)的Q值估计方法和奖励函数。通过马尔科夫决策过程,将车辆并道过程建模为强化学习问题,分析TD3强化学习算法中Q值低估对并道决策的影响。对TD3算法的双评论家目标网络执行蒙特卡洛随机失活,在获得两个Q值估计样本的基础上,提出基于样本方差加权平均的Q值估计方法,提高TD3算法的Q值估计精度。在优先保证完成并道任务的前提下,充分考虑车辆并道过程中的安全性、舒适性和交通效率,建立完备的奖励函数。基于改进的TD3算法和奖励函数,通过BARK模拟器开展自动驾驶汽车并道策略学习和测试。结果表明,提出的改进TD3算法显著提高了Q值估计精度。结合建立的奖励函数,在保证交通效率的同时提高了车辆并道的安全性和乘坐舒适性。To enhance the comprehensive performance of automotive lane-merging,the Q-value estimation method of twin delayed deep deterministic policy gradient(TD3)algorithm and the reward function are improved.The automotive lane-merging model is formalized as the Markov decision process,and the influences of Q-value underestimated by TD3 algorithm on lane-merging strategy are analyzed.A Q-value estimation method based on weighted average of sample variance is proposed to enhance the Q-value estimation accuracy,when two Q-value estimation samples are obtained by performing Monte Carlo dropout on the dual target critic network.With giving priority to the completion of the lane-merging,a more perfect reward function is designed considering the safety,comfort and traffic efficiency.Based on the improved TD3 algorithm and the reward function,a lane-merging strategy of autonomous vehicles is learned and verified with BARK simulator.The results show that the improved TD3 algorithm significantly enhances the accuracy of Q-value estimation.Combined with the established reward function,the safety and ride comfort of lane-merging are improved while ensuring traffic efficiency.

关键词：自动驾驶汽车强化学习并道策略 Q值估计

分类号：U461[机械工程—车辆工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

TD3算法改进与自动驾驶汽车并道策略学习被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

TD3算法改进与自动驾驶汽车并道策略学习 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

TD3算法改进与自动驾驶汽车并道策略学习被引量：4