单交叉口信号配时的离线Q学习模型研究  被引量:5

The Study on Off-line Q-learning Model for Single Intersection Signal Timing

在线阅读下载全文

作  者:卢守峰[1] 韦钦平[1] 刘喜敏[1] 

机构地区:[1]长沙理工大学交通运输工程学院,湖南长沙410114

出  处:《控制工程》2012年第6期987-992,共6页Control Engineering of China

基  金:国家自然科学基金项目(71071024;70701006);教育部科研重点项目(145);湖南省教育厅科研重点项目(09A003;11C0038);长沙市科技局重点项目(K1106004-11;K1001010-11);道路结构与材料交通部重点实验室开放基金项目(kfj100206)

摘  要:为提高交通控制系统的适应性和鲁棒性,采用强化学习方法实现交通控制模型的学习能力。对固定周期和变周期两种模式下的单交叉口信号配时优化进行研究,构造了等饱和度优化目标的奖赏函数,建立了等饱和度和延误最小两个优化目标的离线Q学习模型。采用对流量进行离散的方法解决了状态维数爆炸问题。通过算例对建立的4种离线Q学习模型解的结构、最优解的分布进行分析,结果表明相对于在线Q学习模型,离线Q学习模型更适合交叉口信号配时优化。采用"离线学习,在线应用"的方法,将建立的定周期延误最小离线Q学习模型与Webster定周期模型的性能进行对比,总体上前者的车均延误和累积延误低于后者。The development ofa learning model for improving traffic control system adaptability and robustness of the control has an im- portant role. In this paper, we use the reinforcement learning theory to realize the learning ability of traffic control model. The single in- tersection signal timing under fixed cycle and variable cycle has been studied. The paper first proposed the reward function for equal sat- uration principle. Then we proposed the off-line Q-learning models for equal saturation principle and delay minimization goals. The structure of the solutions, and the distribution of the optimal solution of four off-line Q-learning models were analyzed. The paper uses the discretization method of flow rate to solve the dimension explosion. The results show that compared to online Q-learning model, off- line Q-learning model is more suitable for traffic signal timing optimization. Lastly, the paper compares the off-line Q-learning model of delay minimization under fixed cycle and Webster model. The average delay per vehicle and cumulative delay of the former is lower than the latter.

关 键 词:交通控制 信号配时 离线 学习 变周期 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象