基于样本优化的PPO算法在单路口信号控制的应用

Application of Sample-optimized PPO Algorithm in Single Intersection Signal Control

作　　者：张国有[1] 张新武 ZHANG Guo-You;ZHANG Xin-Wu(College of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)

机构地区：[1]太原科技大学计算机科学与技术学院,太原030024

出　　处：《计算机系统应用》2024年第6期161-168,共8页Computer Systems & Applications

基　　金：国家自然科学基金(62072325);山西省自然科学基金(202203021221145);太原科技大学科技创新基金(20212039);山西省基础研究计划(202103021224272)。

摘　　要：优化交通信号的控制策略可以提高道路车辆通行效率,缓解交通拥堵.针对基于值函数的深度强化学习算法难以高效优化单路口信号控制策略的问题,构建了一种基于样本优化的近端策略优化(MPPO)算法的单路口信号控制方法,通过对传统PPO算法中代理目标函数进行最大化提取,有效提高了模型选择样本的质量,采用多维交通状态向量作为模型观测值的输入方法,以及时跟踪并利用道路交通状态的动态变化过程.为了验证MPPO算法模型的准确性和有效性,在城市交通微观模拟软件(SUMO)上与值函数强化学习控制方法进行对比.仿真实验表明,相比于值函数强化学习控制方法,该方法更贴近真实的交通场景,显著加快了车辆累计等待时间的收敛速度,车辆的平均队列长度和平均等待时间明显缩短,有效提高了单路口车辆的通行效率.Optimizing the control strategy of traffic signals can improve the efficiency of vehicular traffic on roads and alleviate congestion.To overcome the challenge of efficiently optimizing signal control strategies at single intersections using value function-based deep reinforcement learning algorithms,this study develops a method based on sample optimization called modified proximal policy optimization(MPPO).This approach enhances the quality of model sample selection by maximizing the extraction from the agent target function in the traditional PPO algorithm.It employs a multidimensional traffic state vector as input for the model's observations,enabling it to promptly track and utilize the dynamic changes in road traffic conditions.The accuracy and effectiveness of the MPPO algorithm model are verified by comparing it with value function reinforcement learning control methods using the urban traffic micro simulation software(SUMO).Simulation experiments show that this approach closely resembles real traffic scenarios compared to value function reinforcement learning control methods.It significantly accelerates the convergence speed of cumulative vehicle waiting time,noticeably reduces the average vehicle queue length and waiting time,and effectively improves the traffic throughput at the intersection.

关键词：交通信号控制深度强化学习近端策略优化算法代理目标函数状态特征向量

分类号：U491.54[交通运输工程—交通运输规划与管理] TP18[交通运输工程—道路与铁道工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于样本优化的PPO算法在单路口信号控制的应用

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于样本优化的PPO算法在单路口信号控制的应用

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索