基于改进深度强化学习算法的行为决策方法  

A Behavioral Decision-Making Method Based on Improved Deep Reinforcement Learning Algorithms

在线阅读下载全文

作  者:贾瑞豪 JIA Ruihao(School of Automobile,Chang'an University,Xi'an 710064,China)

机构地区:[1]长安大学汽车学院,陕西西安710064

出  处:《汽车实用技术》2025年第1期25-30,共6页Automobile Applied Technology

摘  要:针对传统深度强化学习算法因训练时探索策略差导致在自动驾驶决策任务中同时出现行驶效率低、收敛慢和决策成功率低的问题,提出了结合专家评价的深度竞争双Q网络的决策方法。提出离线专家模型和在线模型,在两者间引入自适应平衡因子;引入自适应重要性系数的优先经验回放机制在竞争深度Q网络的基础上搭建在线模型;设计了考虑行驶效率、安全性和舒适性的奖励函数。结果表明,该算法相较于D3QN、PERD3QN在收敛速度上分别提高了25.93%和20.00%,决策成功率分别提高了3.19%和2.77%,平均步数分别降低了6.40%和0.14%,平均车速分别提升了7.46%与0.42%。Aiming at the traditional deep reinforcement learning algorithms'problems of simultan-eous low driving efficiency,slow convergence and low decision success rate in self-driving decision-making tasks due to poor exploration strategies during training,a decision-making method of deep competitive double Q network combined with expert evaluation is proposed.An offline expert model and an online model are proposed,and an adaptive balance factor is introduced between them;a prioritized experience replay mechanism with adaptive importance coefficients is introduced to build an online model on the basis of the competitive deep Q-network;and a reward function that considers driving efficiency,safety,and comfort is designed.The results show that the algorithm improves the convergence speed by 25.93%and 20.00%,the decision success rate by 3.19%and 2.77%,the average steps by 6.40%and 0.14%,and the average speed by 7.46%and 0.42%,respectively,compared with D3QN and PERD3QN.

关 键 词:自动驾驶 行为决策 深度强化学习 模仿学习 改进DQN算法 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] U463.6[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象