检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈喜群[1] 朱奕璋 谢宁珂 耿茂思 吕朝锋 CHEN Xiqun;ZHU Yizhang;XIE Ningke;GENG Maosi;LV Chaofeng(Institute of Intelligent Transportation Systems,College of Civil Engineering and Architecture,Zhejiang University,Hangzhou 310058,China;Institute of Intelligent Transportation Systems,Polytechnic Institute,Zhejiang University,Hangzhou 310058,China;College of Civil Engineering and Architecture,Zhejiang University,Hangzhou 310058,China)
机构地区:[1]浙江大学,建筑工程学院,智能交通研究所,杭州310058 [2]浙江大学,工程师学院,智能交通研究所,杭州310058 [3]浙江大学,建筑工程学院,杭州310058
出 处:《交通运输系统工程与信息》2024年第3期114-126,共13页Journal of Transportation Systems Engineering and Information Technology
基 金:国家自然科学基金(72171210)。
摘 要:针对路网交通信号控制的复杂性,本文提出基于异构多智能体自注意力网络的路网信号协调顺序优化方法,提升路网范围内多交叉口信号控制策略性能。首先,模型考虑多交叉口交通流的空间相关性,采用基于自注意力机制的价值编码器学习交通观测表征,实现路网级通信;其次,面向多智能体策略更新的非稳态环境,模型在前序智能体的联合动作基础上,基于多智能体优势分解的策略解码器,顺序决策最优反应动作;最后,设计基于有效行驶车辆的动作掩码机制,在时效完备区间自适应调节决策频率,并提出考虑等待公平性的时空压力奖励函数,进一步提高策略性能与实用性。在杭州路网数据集上验证模型有效性,结果表明:所提模型在2个数据集和5个性能指标上均优于基准模型;相比最优基准模型,所提模型平均行程时间降低10.89%,平均排队长度降低18.84%,平均等待时间降低22.21%。此外,所提模型的泛化能力更强,且显著减少车辆等待时间过长的情形。Focusing on the complex traffic signal control task in an urban network,this study proposes a coordinated sequential optimization method based on a Heterogeneous Multi-Agent Transformer(HMATLight)to optimize network-wide traffic signals and improve the performance of signal control policy at intersections within the urban network.Specifically,considering the spatial correlation of multi-intersection traffic flow,a value encoder based on a self-attention mechanism is first designed to learn traffic observation representations and realize network-level communication.Secondly,in response to the non-stationary environment for multi-agent policy updates,a policy decoder based on the multi-agent advantage decomposition is constructed,which can sequentially output the optimal responsive action on the basis of the joint actions of preceding agents.Besides,an action-masking mechanism based on effective driving vehicles,adapting the decision frequency within the time-adequate interval,and a spatio-temporal pressure reward function considering the waiting fairness are constructed,which further enhance policy performance and practicality.A series of experiments are carried out on Hangzhou network datasets to validate the effectiveness of the proposed method.Experimental results show that the proposed HMATLight outperforms all baselines on two datasets with five metrics.Compared with the best-performed baseline,HMATLight decreases the average travel time by 10.89%,the average queue length by 18.84%and the average waiting time by 22.21%.Furthermore,HMATLight is dramatically higher in generalization and significantly reduces instances of long vehicle waiting times.
关 键 词:智能交通 深度强化学习 路网信号控制 异构多智能体 时空压力奖励
分 类 号:U491[交通运输工程—交通运输规划与管理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.192.62