检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张萌 王殿海[1] 金盛[1] ZHANG Meng;WANG Dian-hai;JIN Sheng(College of Civil Engineering and Architecture,Zhejiang University,Hangzhou 310058,China)
出 处:《浙江大学学报(工学版)》2023年第12期2524-2532,2543,共10页Journal of Zhejiang University:Engineering Science
基 金:国家自然科学基金资助项目(52131202,52072340,71901193);浙江省杰出青年科学基金资助项目(LR23E080002)。
摘 要:针对深度强化学习信号控制方法存在训练不稳定、收敛慢以及相位频繁改变的问题,基于双决斗深度Q网络(3DQN)算法引入预训练模块和相位绿灯时间计算模块,提出结合领域经验的信号控制方法.通过优化双重Q学习损失、监督式边际分类损失和正则化损失,使预训练模块引导3DQN智能体模仿Max-Pressure方法的策略,以稳定并加快智能体的训练过程.相位绿灯时间计算模块基于平均车头时距和排队长度动态调整相位绿灯时间以减少绿灯损失.以杭州市萧山区机场城市大道和博奥路交叉口为例,在仿真平台SUMO上对所提方法进行验证.实验结果表明,所提方法能有效改进传统3DQN算法的训练速度.相比于传统控制方法,所提方法明显缩短了车辆平均旅行时间,提高了交叉口运行效率.To address the problems of unstable training,slow convergence and frequent phase changes of signal control methods based on deep reinforcement learning,a signal control method that integrates domain expertise was proposed by incorporating a pre-training module and a phase green time calculation module based on the double-dueling deep Q network(3DQN)algorithm.The pre-training module was introduced to guide the 3DQN agent to mimic the strategy of Max-Pressure method by optimizing the dual Q learning loss,supervised marginal classification loss and regularization loss,whereby the training process was stabilized and accelerated.The phase green light time calculation module dynamically adjusted the phase green light time to reduce green light loss based on the average time headway and queue length of the current phase.The intersection of Airport City Avenue and Boao Road in Xiaoshan District,Hangzhou was used as an example to verify the algorithm on the simulation platform SUMO.The simulation test results show that the proposed method can not only effectively improve the training speed of the traditional 3DQN algorithm,but also significantly reduce the average vehicle travel time and improve the intersection operation efficiency compared with the traditional control method.
关 键 词:交通信号控制 强化学习 深度强化学习 监督学习 预训练
分 类 号:U491.4[交通运输工程—交通运输规划与管理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.189.192.24