基于融合状态预测的深度强化学习A2C的交通信号控制

Traffic Signal Control Based on Deep Reinforcement Learning A2C Integrated with State Prediction

作　　者：叶宝林孙瑞涛李灵犀[3] 吴维敏[4] YE Baolin;SUN Ruitao;LI Lingxi;WU Weimin(School of Information Science and Engineering,Jiaxing University,Jiaxing 314001,Zhejiang,China;School of Information Science and Engineering,Zhejiang Sci-Tech University,Hangzhou 310018,Zhejiang,China;Elmore Family School of Electrical and Computer Engineering,Purdue University,West Lafayette 47907,USA;State Key Laboratory of Industrial Control Technology,Institute of Cyber-Systems and Control,Zhejiang University,Hangzhou 310027,Zhejiang,China)

机构地区：[1]嘉兴大学信息科学与工程学院,浙江嘉兴314001 [2]浙江理工大学信息科学与工程学院,浙江杭州310018 [3]普渡大学埃尔莫尔家族电气与计算机工程学院,美国西拉法叶47907 [4]浙江大学智能系统与控制研究所工业控制技术国家重点实验室,浙江杭州310027

出　　处：《计算机工程》2025年第5期33-42,共10页Computer Engineering

基　　金：国家自然科学基金(61603154);浙江省自然科学基金(LTGS23F030002);浙江省尖兵领雁研发攻关计划项目(2023C01174);嘉兴市应用性基础研究项目(2023AY11034);工业控制技术国家重点实验室开放课题(ICT2022B52)。

摘　　要：现有基于强化学习的交通信号控制方法主要使用历史交通状态和当前时间步的实时交通状态来确定下一个时间步的控制策略,造成控制策略始终滞后于交通状态一个时间步。为了解决该问题,提出一种基于融合交通状态预测的深度强化学习优势演员评论家(A2C)的交通信号控制方法。首先,为了获取未来时间步的交通状态,以确保制定的控制策略能够更精准地响应实时交通状态下的决策需求,设计一个长短时记忆(LSTM)网络预测路网未来时间步的交通状态。然后,为了提高输入深度强化学习模型中数据的准确性和鲁棒性,设计一个卡尔曼滤波器对采集的历史交通状态数据和LSTM网络预测的未来交通状态数据进行融合。其次,为了使深度强化学习模型能够更全面地理解交通流量中包含的时间依赖关系,并实现更高效和稳定的交通信号控制决策,提出一种融合双向LSTM网络的A2C算法。最后,基于微观交通仿真(SUMO)平台的仿真测试结果表明,与传统交通信号控制方法和基于深度强化学习A2C的交通信号控制方法相比,该方法在低峰、平峰和高峰两种不同交通流量状态下均能够取得更好的交通信号控制效益。Existing reinforcement learning-based traffic signal control methods primarily utilize historical and real-time traffic states at the current time step to determine the control strategy for the next time step.However,this approach results in the control strategy to lag behind the actual traffic state by one time step.To address this issue,this study proposes a traffic signal control method based on Advantage Actor Critic(A2C)using deep reinforcement learning.First,a Long Short-Term Memory(LSTM)network is designed to predict the future traffic states of a road network,to obtain the traffic state of future time steps and ensure that the formulated control strategy can respond more accurately to decision-making requirements under real-time traffic conditions.Second,a Kalman filter is designed to fuse collected historical traffic state data with the future traffic state data predicted by the LSTM,to improve the accuracy and robustness of the data being input into the deep reinforcement learning model.Additionally,a bidirectional LSTM-integrated A2C algorithm is proposed that allows the deep reinforcement learning model to fully capture the time-dependent relationships within traffic flow and achieve more efficient and stable traffic signal control decisions.Finally,simulations conducted on the Simulation of Urban Mobility(SUMO)platform demonstrate that the proposed method achieves superior traffic signal control efficiency under both low-peak,off-peak and peak traffic conditions compared to traditional traffic signal control methods and deep reinforcement learning A2C-based traffic signal control method.

关键词：交通信号控制优势演员评论家交通状态预测双向长短时记忆网络

分类号：TN929[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于融合状态预测的深度强化学习A2C的交通信号控制

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于融合状态预测的深度强化学习A2C的交通信号控制

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索