检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张志勇[1,2] 黄大洋 黄彩霞[1,3] 胡林 杜荣华[2] ZHANG Zhiyong;HUANG Dayang;HUANG Caixia;HU Lin;DU Ronghua(Hunan Province Key Laboratory of Intelligent Manufacturing Technology for High-performance Mechanical Equipment,Changsha University of Science and Technology,Changsha 410114;College of Automobile and Mechanical Engineering,Changsha University of Science and Technology,Changsha 410114;Hunan Provincial Key Laboratory of Automotive Power and Transmission System,Hunan Institute of Technology,Xiangtan 411104)
机构地区:[1]长沙理工大学机械装备高性能智能制造关键技术湖南省重点实验室,长沙410114 [2]长沙理工大学汽车与机械工程学院,长沙410114 [3]湖南工程学院汽车动力与传动系统湖南省重点实验室,湘潭411104
出 处:《机械工程学报》2023年第8期224-234,共11页Journal of Mechanical Engineering
基 金:国家自然科学基金(61973047);湖南省自然科学基金(2021JJ30182,2022JJ50020);湖南省教育厅科学研究(20A018);机械装备高性能智能制造关键技术湖南省重点实验室(长沙理工大学)开放基金(2020YB02)资助项目。
摘 要:为提高自动并道策略的综合性能,改进了双延迟深度确定性策略梯度算法(Twin delayed deep deterministic policy gradient,TD3)的Q值估计方法和奖励函数。通过马尔科夫决策过程,将车辆并道过程建模为强化学习问题,分析TD3强化学习算法中Q值低估对并道决策的影响。对TD3算法的双评论家目标网络执行蒙特卡洛随机失活,在获得两个Q值估计样本的基础上,提出基于样本方差加权平均的Q值估计方法,提高TD3算法的Q值估计精度。在优先保证完成并道任务的前提下,充分考虑车辆并道过程中的安全性、舒适性和交通效率,建立完备的奖励函数。基于改进的TD3算法和奖励函数,通过BARK模拟器开展自动驾驶汽车并道策略学习和测试。结果表明,提出的改进TD3算法显著提高了Q值估计精度。结合建立的奖励函数,在保证交通效率的同时提高了车辆并道的安全性和乘坐舒适性。To enhance the comprehensive performance of automotive lane-merging,the Q-value estimation method of twin delayed deep deterministic policy gradient(TD3)algorithm and the reward function are improved.The automotive lane-merging model is formalized as the Markov decision process,and the influences of Q-value underestimated by TD3 algorithm on lane-merging strategy are analyzed.A Q-value estimation method based on weighted average of sample variance is proposed to enhance the Q-value estimation accuracy,when two Q-value estimation samples are obtained by performing Monte Carlo dropout on the dual target critic network.With giving priority to the completion of the lane-merging,a more perfect reward function is designed considering the safety,comfort and traffic efficiency.Based on the improved TD3 algorithm and the reward function,a lane-merging strategy of autonomous vehicles is learned and verified with BARK simulator.The results show that the improved TD3 algorithm significantly enhances the accuracy of Q-value estimation.Combined with the established reward function,the safety and ride comfort of lane-merging are improved while ensuring traffic efficiency.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.218.251.50