检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]长沙理工大学交通运输工程学院,湖南长沙410114
出 处:《控制工程》2012年第6期987-992,共6页Control Engineering of China
基 金:国家自然科学基金项目(71071024;70701006);教育部科研重点项目(145);湖南省教育厅科研重点项目(09A003;11C0038);长沙市科技局重点项目(K1106004-11;K1001010-11);道路结构与材料交通部重点实验室开放基金项目(kfj100206)
摘 要:为提高交通控制系统的适应性和鲁棒性,采用强化学习方法实现交通控制模型的学习能力。对固定周期和变周期两种模式下的单交叉口信号配时优化进行研究,构造了等饱和度优化目标的奖赏函数,建立了等饱和度和延误最小两个优化目标的离线Q学习模型。采用对流量进行离散的方法解决了状态维数爆炸问题。通过算例对建立的4种离线Q学习模型解的结构、最优解的分布进行分析,结果表明相对于在线Q学习模型,离线Q学习模型更适合交叉口信号配时优化。采用"离线学习,在线应用"的方法,将建立的定周期延误最小离线Q学习模型与Webster定周期模型的性能进行对比,总体上前者的车均延误和累积延误低于后者。The development ofa learning model for improving traffic control system adaptability and robustness of the control has an im- portant role. In this paper, we use the reinforcement learning theory to realize the learning ability of traffic control model. The single in- tersection signal timing under fixed cycle and variable cycle has been studied. The paper first proposed the reward function for equal sat- uration principle. Then we proposed the off-line Q-learning models for equal saturation principle and delay minimization goals. The structure of the solutions, and the distribution of the optimal solution of four off-line Q-learning models were analyzed. The paper uses the discretization method of flow rate to solve the dimension explosion. The results show that compared to online Q-learning model, off- line Q-learning model is more suitable for traffic signal timing optimization. Lastly, the paper compares the off-line Q-learning model of delay minimization under fixed cycle and Webster model. The average delay per vehicle and cumulative delay of the former is lower than the latter.
分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.40