基于BC-MAAC算法的高速入口匝道合并类人决策  

Highway on-ramp merging human-like decision based on BC-MAAC algorithm

作  者:于镝 张昌文 熊双双 刘朋友 Yu Di;Zhang Changwen;Xiong Shuangshuang;Liu Pengyou(School of Automation,Beijing Information Science&Technology University,Beijing 100192,China)

机构地区:[1]北京信息科技大学自动化学院,北京100192

出  处:《计算机应用研究》2025年第1期117-124,共8页Application Research of Computers

基  金:国家自然科学基金资助项目(62103057)。

摘  要:针对在自动驾驶复杂环境下多智能体强化学习算法决策缺乏人类表现出的智能性和奖励函数设计难度大的问题,提出基于BC-MAAC算法的高速入口匝道合并类人决策方案。将行为克隆思想与多智能体注意力动作—评价算法相融合,提出BC-MAAC算法,并且从Highway-env平台收集的多智能体专家数据中推导出专家策略,利用推导的专家策略与智能体当前策略的KL散度来塑造奖励函数,指导智能体训练过程。同时,应用动作屏蔽机制,在每一步过滤掉不安全或无效的动作,提高学习效率。两种不同交通密度场景的仿真结果表明所提算法整体性能优于基线算法,提升了车辆的通行效率和安全性。简单模式中,所提算法的成功率达到100%,平均速度和平均奖励分别至少提升0.73%和11.14%;困难模式中,所提算法的成功率达到93.40%,平均速度和平均奖励分别至少提升3.96%和12.23%。可见BC-MAAC算法通过专家奖励函数指导网联自动驾驶车辆,能够通过合作更类人的完成高速入口匝道合并任务。To address the lack of human-like intelligence and the difficulty in designing reward functions in multiagent reinforcement learning algorithms for autonomous driving in complex environments,this paper advanced a human-like decision-making scheme for highway on-ramp merging based on the BC-MAAC algorithm.Combined behavior cloning IDEA with the multi-actor-attention-critic algorithm,it proposed the BC-MAAC algorithm.Derives expert policies from multi-agent expert data collected on the Highway-env platform,and used the KL divergence between the derived expert policies and the current policies of agents to shape the reward function,so as to guide the training process of the agents.At the same time,the algorithm applied an action masking mechanism to filter out unsafe or ineffective actions at each step to improve learning efficiency.Simulation results under two different traffic density scenarios show that the proposed algorithm outperforms the baseline algorithm overall,improving vehicle efficiency and safety.In the easy mode,the proposed algorithm achieves 100%success rate,improves the average speed and the average reward by at least 0.73%and 11.14%,respectively.In the hard mode,the proposed algorithm achieves 93.40%success rate,improves the average speed and the average reward by at least 3.96%and 12.23%,respectively.It is obvious that the BC-MAAC algorithm guides connected autonomous vehicles to complete the highway on-ramp merging task more human-like through cooperation by using the expert reward function.

关 键 词:网联自动驾驶车辆 智能决策 高速入口匝道合并 行为克隆 多智能体强化学习 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象