基于概率型支持向量分类机的Q学习  被引量:1

Q Learning Based on Probability Support Vector Classification Machine

在线阅读下载全文

作  者:程玉虎[1] 高阳[1] 王雪松[1] 

机构地区:[1]中国矿业大学信息与电气工程学院,江苏徐州221116

出  处:《中国矿业大学学报》2010年第3期408-413,共6页Journal of China University of Mining & Technology

基  金:国家自然科学基金项目(60804022;60974050);教育部新世纪优秀人才支持计划(NCET-08-0836);高等学校博士点专项科研基金项目(20070290537;200802901506);江苏省自然科学基金项目(BK2008126);中国博士后科学基金特别资助项目(2009025331)

摘  要:按TD误差标准,把Q学习系统的状态-动作空间粗略地划分为正负2类.为了描述分类的不确定性和避免简单分类导致的学习精度下降问题,利用概率型支持向量分类机(PSVCM)来使得样本的分类同时具有定性的解释和定量的评价.PSVCM的输入为系统的连续状态和离散动作,输出为带有概率值的类别标签.对由PSVCM判定为正类的离散动作按其概率值进行加权求和,即可得到连续动作空间下的Q学习控制策略.小船靠岸问题的仿真结果表明,与基于传统支持向量分类机的Q学习相比,所提方法不仅能够有效解决具有连续状态和连续动作的非线性系统的Q学习控制,而且其控制性能对初始动作的设置不敏感.The state-action space of a Q learning system was divided into positive and negative classes according to TD error criterion. In order to describe the uncertainty of classification and to solve the problem of low learning precision resulted from simple classification, a probability support vector classification machine (PSVCM) was used to make the classification of samples both have qualitative explanation and quantitative evaluation. The inputs of PSVCM are con- tinuous states and discrete actions, while its output is a class label with a probability value. A Q learning control strategy for continuous action space can be obtained based on a weighted op- eration of the positive actions with their probability values. The simulations results of a boat problem show that the proposed method is suitable for Q learning control for nonlinear systems with continuous states and continuous actions compared with Q learning based on traditional SVCM and the control performance is robust with respect to the setting of initial action.

关 键 词:概率 支持向量分类机 TD误差 Q学习 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象