检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:卢永奎[1] 许旻[1] 李永新[1] 杜华生[1] 吴月华[1] 杨杰[1]
机构地区:[1]中国科学技术大学精密机械与精密仪器系,合肥230027
出 处:《模式识别与人工智能》2006年第4期439-444,共6页Pattern Recognition and Artificial Intelligence
基 金:国家863计划资助项目(No.2001AA422410)
摘 要:介绍一种新的在线自适应的动态模糊Q强化学习算法,系统根据从环境中得到的反馈评估已进行的决策,给予奖励和惩罚,更新系统的Q值,在线自动调整模糊控制的结构与参数。根据系统当前的环境状态以及模糊控制强化学习的Q值来决定当前规则的动作输出,并由模糊推理产生连续输出的动作,扩展贪心搜索策略,确保控制规则的各个输出动作在学习初期都被搜索过,避免陷入局部最优解。将有效跟踪算法和后设学习规则相结合,有效提高系统学习速率,在嵌入式平台中实时控制的实现以及和相关研究结论的对比验证该算法的优越性。A new dynamic fuzzy Q-learning (DFQL) method is presented in this paper which is capable of tuning fuzzy inference systems (FIS) online. In DFQL system, the generation of continuous actions depends upon a discrete number of actions of every fuzzy rule and the vector of firing strengths of fuzzy rule. In order to explore the set of possible actions and acquire experiences through the reinforcement signals, the actions strategy based on the expended greedy algorithm. are selected using an exploration-exploitation A function Q that gives the action quality eligibility trace and meta learning rule is used here to speed up learning, e-completeness of fuzzy rules criterion and temporal-difference (TD) error criterion are considered for rule generation. The DFQL approach has been applied to a real-time control caterpillar robot for the wall following task. Experimental results and comparative studies with the fuzzy Q-learning and continuous-action Q-learning in the wall-following task of mobile robots demonstrate that the proposed DFQL method is superior.
关 键 词:模糊控制 在线自组织 Q强化学习 嵌入式系统 实时控制
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.177