检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国矿业大学信息与电气工程学院,江苏徐州221116
出 处:《中国矿业大学学报》2010年第3期408-413,共6页Journal of China University of Mining & Technology
基 金:国家自然科学基金项目(60804022;60974050);教育部新世纪优秀人才支持计划(NCET-08-0836);高等学校博士点专项科研基金项目(20070290537;200802901506);江苏省自然科学基金项目(BK2008126);中国博士后科学基金特别资助项目(2009025331)
摘 要:按TD误差标准,把Q学习系统的状态-动作空间粗略地划分为正负2类.为了描述分类的不确定性和避免简单分类导致的学习精度下降问题,利用概率型支持向量分类机(PSVCM)来使得样本的分类同时具有定性的解释和定量的评价.PSVCM的输入为系统的连续状态和离散动作,输出为带有概率值的类别标签.对由PSVCM判定为正类的离散动作按其概率值进行加权求和,即可得到连续动作空间下的Q学习控制策略.小船靠岸问题的仿真结果表明,与基于传统支持向量分类机的Q学习相比,所提方法不仅能够有效解决具有连续状态和连续动作的非线性系统的Q学习控制,而且其控制性能对初始动作的设置不敏感.The state-action space of a Q learning system was divided into positive and negative classes according to TD error criterion. In order to describe the uncertainty of classification and to solve the problem of low learning precision resulted from simple classification, a probability support vector classification machine (PSVCM) was used to make the classification of samples both have qualitative explanation and quantitative evaluation. The inputs of PSVCM are con- tinuous states and discrete actions, while its output is a class label with a probability value. A Q learning control strategy for continuous action space can be obtained based on a weighted op- eration of the positive actions with their probability values. The simulations results of a boat problem show that the proposed method is suitable for Q learning control for nonlinear systems with continuous states and continuous actions compared with Q learning based on traditional SVCM and the control performance is robust with respect to the setting of initial action.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15