检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于镝 张昌文 熊双双 刘朋友 Yu Di;Zhang Changwen;Xiong Shuangshuang;Liu Pengyou(School of Automation,Beijing Information Science&Technology University,Beijing 100192,China)
机构地区:[1]北京信息科技大学自动化学院,北京100192
出 处:《计算机应用研究》2025年第1期117-124,共8页Application Research of Computers
基 金:国家自然科学基金资助项目(62103057)。
摘 要:针对在自动驾驶复杂环境下多智能体强化学习算法决策缺乏人类表现出的智能性和奖励函数设计难度大的问题,提出基于BC-MAAC算法的高速入口匝道合并类人决策方案。将行为克隆思想与多智能体注意力动作—评价算法相融合,提出BC-MAAC算法,并且从Highway-env平台收集的多智能体专家数据中推导出专家策略,利用推导的专家策略与智能体当前策略的KL散度来塑造奖励函数,指导智能体训练过程。同时,应用动作屏蔽机制,在每一步过滤掉不安全或无效的动作,提高学习效率。两种不同交通密度场景的仿真结果表明所提算法整体性能优于基线算法,提升了车辆的通行效率和安全性。简单模式中,所提算法的成功率达到100%,平均速度和平均奖励分别至少提升0.73%和11.14%;困难模式中,所提算法的成功率达到93.40%,平均速度和平均奖励分别至少提升3.96%和12.23%。可见BC-MAAC算法通过专家奖励函数指导网联自动驾驶车辆,能够通过合作更类人的完成高速入口匝道合并任务。To address the lack of human-like intelligence and the difficulty in designing reward functions in multiagent reinforcement learning algorithms for autonomous driving in complex environments,this paper advanced a human-like decision-making scheme for highway on-ramp merging based on the BC-MAAC algorithm.Combined behavior cloning IDEA with the multi-actor-attention-critic algorithm,it proposed the BC-MAAC algorithm.Derives expert policies from multi-agent expert data collected on the Highway-env platform,and used the KL divergence between the derived expert policies and the current policies of agents to shape the reward function,so as to guide the training process of the agents.At the same time,the algorithm applied an action masking mechanism to filter out unsafe or ineffective actions at each step to improve learning efficiency.Simulation results under two different traffic density scenarios show that the proposed algorithm outperforms the baseline algorithm overall,improving vehicle efficiency and safety.In the easy mode,the proposed algorithm achieves 100%success rate,improves the average speed and the average reward by at least 0.73%and 11.14%,respectively.In the hard mode,the proposed algorithm achieves 93.40%success rate,improves the average speed and the average reward by at least 3.96%and 12.23%,respectively.It is obvious that the BC-MAAC algorithm guides connected autonomous vehicles to complete the highway on-ramp merging task more human-like through cooperation by using the expert reward function.
关 键 词:网联自动驾驶车辆 智能决策 高速入口匝道合并 行为克隆 多智能体强化学习
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.137.165.75