基于状态聚类的多站点CSPS系统的协同控制方法  被引量:1

Coordinate Control of Multiple CSPS System Based on State Aggregation Method

在线阅读下载全文

作  者:唐昊[1,2] 裴荣[2] 周雷[2] 谭琦[1] 

机构地区:[1]合肥工业大学电气与与自动化工程学院,合肥230009 [2]合肥工业大学计算机与信息学院,合肥230009

出  处:《自动化学报》2014年第5期901-908,共8页Acta Automatica Sinica

基  金:国家自然科学基金(61174186;71231004);国家国际科技合作项目(2011FA10440);教育部新世纪优秀人才计划项目(NCET-11-0626);高等学校博士学科点专项科研基金(20130111110007)资助~~

摘  要:单站点传送带给料加工站(Conveyor-serviced production station,CSPS)系统中,可运用强化学习对状态–行动空间进行有效探索,以搜索近似最优的前视距离控制策略.但是多站点CSPS系统的协同控制问题中,系统状态空间的大小会随着站点个数的增加和缓存库容量的增加而成指数形式(或几何级数)增长,从而导致维数灾,影响学习算法的收敛速度和优化效果.为此,本文在站点局域信息交互机制的基础上引入状态聚类的方法,以减小每个站点学习空间的大小和复杂性.首先,将多个站点看作相对独立的学习主体,且各自仅考虑邻近下游站点的缓存库的状态并纳入其性能值学习过程;其次,将原状态空间划分成多个不相交的子集,每个子集用一个抽象状态表示,然后,建立基于状态聚类的多站点反馈式Q学习算法.通过该方法,可在抽象状态空间上对各站点的前视距离策略进行优化学习,以寻求整个系统的生产率最大.仿真实验结果说明,与一般的多站点反馈式Q学习方法相比,基于状态聚类的多站点反馈式Q学习方法不仅具有收敛速度快的优点,而且还在一定程度上提高了系统生产率.In a single conveyor-serviced production station (CSPS) system, we can learn an approximate optimal look- ahead policy by reinforcement learning (RL) through exploring the state-action space. However, for the coordinate control problem in a multiple CSPS system, the state space will grow exponentially or geometrically as the number of stations and the capacity of buffer increase. As a result, the learning process will suffer from the curse of dimensionality, which may have a negative influence on convergence speed and optimized value. Therefore, by combining a local information interaction mechanism among stations, we introduce a state aggregation method to reduce the size and complexity of each station's leaning space. Firstly, each station is regarded as an independent learning agent that incorporates only the buffer state of its nearest downstream station into its own learning process. Secondly, the original state space is divided into several disjoint sets and each set is represented by an abstract state, and a multiple-agent state aggregation feedback Q-learning (SAFQL) algorithm is proposed afterwards. Through our proposed approach, the agent can learn an optimized look-ahead policy over the abstract state space to improve the entire system's processing rate. Finally, we demonstrate by a numerical example that, in comparison to general feedback Q-learning algorithm, SAFQL algorithm can not only fasten the convergence speed, but also improve the processing rate in some degree.

关 键 词:多站点CSPS系统 局域信息交互 状态聚类 反馈式Q学习 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象