TD3算法改进与自动驾驶汽车并道策略学习  被引量:4

TD3 Algorithm Improving and Lane-merging Strategy Learning for Autonomous Vehicles

在线阅读下载全文

作  者:张志勇[1,2] 黄大洋 黄彩霞[1,3] 胡林 杜荣华[2] ZHANG Zhiyong;HUANG Dayang;HUANG Caixia;HU Lin;DU Ronghua(Hunan Province Key Laboratory of Intelligent Manufacturing Technology for High-performance Mechanical Equipment,Changsha University of Science and Technology,Changsha 410114;College of Automobile and Mechanical Engineering,Changsha University of Science and Technology,Changsha 410114;Hunan Provincial Key Laboratory of Automotive Power and Transmission System,Hunan Institute of Technology,Xiangtan 411104)

机构地区:[1]长沙理工大学机械装备高性能智能制造关键技术湖南省重点实验室,长沙410114 [2]长沙理工大学汽车与机械工程学院,长沙410114 [3]湖南工程学院汽车动力与传动系统湖南省重点实验室,湘潭411104

出  处:《机械工程学报》2023年第8期224-234,共11页Journal of Mechanical Engineering

基  金:国家自然科学基金(61973047);湖南省自然科学基金(2021JJ30182,2022JJ50020);湖南省教育厅科学研究(20A018);机械装备高性能智能制造关键技术湖南省重点实验室(长沙理工大学)开放基金(2020YB02)资助项目。

摘  要:为提高自动并道策略的综合性能,改进了双延迟深度确定性策略梯度算法(Twin delayed deep deterministic policy gradient,TD3)的Q值估计方法和奖励函数。通过马尔科夫决策过程,将车辆并道过程建模为强化学习问题,分析TD3强化学习算法中Q值低估对并道决策的影响。对TD3算法的双评论家目标网络执行蒙特卡洛随机失活,在获得两个Q值估计样本的基础上,提出基于样本方差加权平均的Q值估计方法,提高TD3算法的Q值估计精度。在优先保证完成并道任务的前提下,充分考虑车辆并道过程中的安全性、舒适性和交通效率,建立完备的奖励函数。基于改进的TD3算法和奖励函数,通过BARK模拟器开展自动驾驶汽车并道策略学习和测试。结果表明,提出的改进TD3算法显著提高了Q值估计精度。结合建立的奖励函数,在保证交通效率的同时提高了车辆并道的安全性和乘坐舒适性。To enhance the comprehensive performance of automotive lane-merging,the Q-value estimation method of twin delayed deep deterministic policy gradient(TD3)algorithm and the reward function are improved.The automotive lane-merging model is formalized as the Markov decision process,and the influences of Q-value underestimated by TD3 algorithm on lane-merging strategy are analyzed.A Q-value estimation method based on weighted average of sample variance is proposed to enhance the Q-value estimation accuracy,when two Q-value estimation samples are obtained by performing Monte Carlo dropout on the dual target critic network.With giving priority to the completion of the lane-merging,a more perfect reward function is designed considering the safety,comfort and traffic efficiency.Based on the improved TD3 algorithm and the reward function,a lane-merging strategy of autonomous vehicles is learned and verified with BARK simulator.The results show that the improved TD3 algorithm significantly enhances the accuracy of Q-value estimation.Combined with the established reward function,the safety and ride comfort of lane-merging are improved while ensuring traffic efficiency.

关 键 词:自动驾驶汽车 强化学习 并道策略 Q值估计 

分 类 号:U461[机械工程—车辆工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象