Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control  

在线阅读下载全文

作  者:Zihao Sheng Zilin Huang Sikai Chen 

机构地区:[1]Department of Civil and Environmental Engineering,University of Wisconsin-Madison,Madison,WI 53706,USA

出  处:《Communications in Transportation Research》2024年第1期301-319,共19页交通研究通讯(英文)

基  金:University of Wisconsin-Madison's Center for Connected and Automated Transportation(CCAT),a part of the larger CCAT consortium,a USDOT Region 5 University Transportation Center funded by the U.S.Department of Transportation,Award#69A3552348305;The contents of this paper reflect the views of the authors,who are responsible for the facts and the accuracy of the data presented herein,and do not necessarily reflect the official views or policies of the sponsoring organization.

摘  要:Model-based reinforcement learning(RL)is anticipated to exhibit higher sample efficiency than model-free RL by utilizing a virtual environment model.However,obtaining sufficiently accurate representations of environmental dynamics is challenging because of uncertainties in complex systems and environments.An inaccurate environment model may degrade the sample efficiency and performance of model-based RL.Furthermore,while model-based RL can improve sample efficiency,it often still requires substantial training time to learn from scratch,potentially limiting its advantages over model-free approaches.To address these challenges,this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero.Our approach integrates traffic expert knowledge into a virtual environment model,employing the intelligent driver model(IDM)for basic dynamics and neural networks for residual dynamics,thus ensuring adaptability to complex scenarios.We propose a novel strategy that combines traditional control methods with residual RL,facilitating efficient learning and policy optimization without the need to learn from scratch.The proposed approach is applied to connected automated vehicle(CAV)trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flows.The experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared with the baseline agents in terms of sample efficiency,traffic flow smoothness and traffic mobility.

关 键 词:Model-based reinforcement learning Residual policy learning Mixed traffic flow Connected automated vehicles 

分 类 号:TN9[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象