基于异构多智能体自注意力网络的路网信号协调顺序优化方法  

Coordinated Sequential Optimization for Network-wide Traffic Signal Control Based on Heterogeneous Multi-agent Transformer

在线阅读下载全文

作  者:陈喜群[1] 朱奕璋 谢宁珂 耿茂思 吕朝锋 CHEN Xiqun;ZHU Yizhang;XIE Ningke;GENG Maosi;LV Chaofeng(Institute of Intelligent Transportation Systems,College of Civil Engineering and Architecture,Zhejiang University,Hangzhou 310058,China;Institute of Intelligent Transportation Systems,Polytechnic Institute,Zhejiang University,Hangzhou 310058,China;College of Civil Engineering and Architecture,Zhejiang University,Hangzhou 310058,China)

机构地区:[1]浙江大学,建筑工程学院,智能交通研究所,杭州310058 [2]浙江大学,工程师学院,智能交通研究所,杭州310058 [3]浙江大学,建筑工程学院,杭州310058

出  处:《交通运输系统工程与信息》2024年第3期114-126,共13页Journal of Transportation Systems Engineering and Information Technology

基  金:国家自然科学基金(72171210)。

摘  要:针对路网交通信号控制的复杂性,本文提出基于异构多智能体自注意力网络的路网信号协调顺序优化方法,提升路网范围内多交叉口信号控制策略性能。首先,模型考虑多交叉口交通流的空间相关性,采用基于自注意力机制的价值编码器学习交通观测表征,实现路网级通信;其次,面向多智能体策略更新的非稳态环境,模型在前序智能体的联合动作基础上,基于多智能体优势分解的策略解码器,顺序决策最优反应动作;最后,设计基于有效行驶车辆的动作掩码机制,在时效完备区间自适应调节决策频率,并提出考虑等待公平性的时空压力奖励函数,进一步提高策略性能与实用性。在杭州路网数据集上验证模型有效性,结果表明:所提模型在2个数据集和5个性能指标上均优于基准模型;相比最优基准模型,所提模型平均行程时间降低10.89%,平均排队长度降低18.84%,平均等待时间降低22.21%。此外,所提模型的泛化能力更强,且显著减少车辆等待时间过长的情形。Focusing on the complex traffic signal control task in an urban network,this study proposes a coordinated sequential optimization method based on a Heterogeneous Multi-Agent Transformer(HMATLight)to optimize network-wide traffic signals and improve the performance of signal control policy at intersections within the urban network.Specifically,considering the spatial correlation of multi-intersection traffic flow,a value encoder based on a self-attention mechanism is first designed to learn traffic observation representations and realize network-level communication.Secondly,in response to the non-stationary environment for multi-agent policy updates,a policy decoder based on the multi-agent advantage decomposition is constructed,which can sequentially output the optimal responsive action on the basis of the joint actions of preceding agents.Besides,an action-masking mechanism based on effective driving vehicles,adapting the decision frequency within the time-adequate interval,and a spatio-temporal pressure reward function considering the waiting fairness are constructed,which further enhance policy performance and practicality.A series of experiments are carried out on Hangzhou network datasets to validate the effectiveness of the proposed method.Experimental results show that the proposed HMATLight outperforms all baselines on two datasets with five metrics.Compared with the best-performed baseline,HMATLight decreases the average travel time by 10.89%,the average queue length by 18.84%and the average waiting time by 22.21%.Furthermore,HMATLight is dramatically higher in generalization and significantly reduces instances of long vehicle waiting times.

关 键 词:智能交通 深度强化学习 路网信号控制 异构多智能体 时空压力奖励 

分 类 号:U491[交通运输工程—交通运输规划与管理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象