检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Zihao Sheng Zilin Huang Sikai Chen
出 处:《Communications in Transportation Research》2024年第1期301-319,共19页交通研究通讯(英文)
基 金:University of Wisconsin-Madison's Center for Connected and Automated Transportation(CCAT),a part of the larger CCAT consortium,a USDOT Region 5 University Transportation Center funded by the U.S.Department of Transportation,Award#69A3552348305;The contents of this paper reflect the views of the authors,who are responsible for the facts and the accuracy of the data presented herein,and do not necessarily reflect the official views or policies of the sponsoring organization.
摘 要:Model-based reinforcement learning(RL)is anticipated to exhibit higher sample efficiency than model-free RL by utilizing a virtual environment model.However,obtaining sufficiently accurate representations of environmental dynamics is challenging because of uncertainties in complex systems and environments.An inaccurate environment model may degrade the sample efficiency and performance of model-based RL.Furthermore,while model-based RL can improve sample efficiency,it often still requires substantial training time to learn from scratch,potentially limiting its advantages over model-free approaches.To address these challenges,this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero.Our approach integrates traffic expert knowledge into a virtual environment model,employing the intelligent driver model(IDM)for basic dynamics and neural networks for residual dynamics,thus ensuring adaptability to complex scenarios.We propose a novel strategy that combines traditional control methods with residual RL,facilitating efficient learning and policy optimization without the need to learn from scratch.The proposed approach is applied to connected automated vehicle(CAV)trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flows.The experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared with the baseline agents in terms of sample efficiency,traffic flow smoothness and traffic mobility.
关 键 词:Model-based reinforcement learning Residual policy learning Mixed traffic flow Connected automated vehicles
分 类 号:TN9[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.85