基于节点生长k-均值聚类算法的强化学习方法  被引量:13

A Reinforcement Learning Method Based on Node-Growing k-Means Clustering Algorithm

在线阅读下载全文

作  者:陈宗海[1] 文锋[1] 聂建斌[1] 吴晓曙[1] 

机构地区:[1]中国科学技术大学自动化系,合肥230027

出  处:《计算机研究与发展》2006年第4期661-666,共6页Journal of Computer Research and Development

基  金:国家自然科学基金项目(60575033)

摘  要:处理连续状态强化学习问题,主要方法有两类:参数化的函数逼近和自适应离散划分.在分析了现有对连续状态空间进行自适应划分方法的优缺点的基础上,提出了一种基于节点生长k均值聚类算法的划分方法,分别给出了在离散动作和连续动作两种情况下该强化学习方法的算法步骤.在离散动作的MountainCar问题和连续动作的双积分问题上进行仿真实验.实验结果表明,该方法能够根据状态在连续空间的分布,自动调整划分的精度,实现对于连续状态空间的自适应划分,并学习到最佳策略.State variables of real-world problems are usually continuously real-valued variables. However, a standard reinforcement learning method is only suitable for problems with finite discrete states. To apply it to real-world problems, representation of continuous states must be properly handled. There are mainly two kinds of methods. One is parameterized function approximation method and the other is discretization method. To analyze the advantages and disadvantages of the current adaptive partition method, a partition method based on node-growing k-means clustering is proposed. Reinforcement learning methods based on the proposed clustering algorithm are presented for both discrete and continuous action problems. Simulation is conducted on mountain-car problem with discrete actions and on double integrator problem with continuous actions. Results show that the proposed method can adaptively adjust partition resolution and achieve an adaptive partition of continuous state space. Optimal policy is learned at the same time.

关 键 词:强化学习 K-均值聚类算法 Sarsa学习 连续状态表示 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象