动态模糊Q学习算法及嵌入式平台的实时实现  

Dynamic Fuzzy Q-Learning and Its Real-Time Application in Embedded System

在线阅读下载全文

作  者:卢永奎[1] 许旻[1] 李永新[1] 杜华生[1] 吴月华[1] 杨杰[1] 

机构地区:[1]中国科学技术大学精密机械与精密仪器系,合肥230027

出  处:《模式识别与人工智能》2006年第4期439-444,共6页Pattern Recognition and Artificial Intelligence

基  金:国家863计划资助项目(No.2001AA422410)

摘  要:介绍一种新的在线自适应的动态模糊Q强化学习算法,系统根据从环境中得到的反馈评估已进行的决策,给予奖励和惩罚,更新系统的Q值,在线自动调整模糊控制的结构与参数。根据系统当前的环境状态以及模糊控制强化学习的Q值来决定当前规则的动作输出,并由模糊推理产生连续输出的动作,扩展贪心搜索策略,确保控制规则的各个输出动作在学习初期都被搜索过,避免陷入局部最优解。将有效跟踪算法和后设学习规则相结合,有效提高系统学习速率,在嵌入式平台中实时控制的实现以及和相关研究结论的对比验证该算法的优越性。A new dynamic fuzzy Q-learning (DFQL) method is presented in this paper which is capable of tuning fuzzy inference systems (FIS) online. In DFQL system, the generation of continuous actions depends upon a discrete number of actions of every fuzzy rule and the vector of firing strengths of fuzzy rule. In order to explore the set of possible actions and acquire experiences through the reinforcement signals, the actions strategy based on the expended greedy algorithm. are selected using an exploration-exploitation A function Q that gives the action quality eligibility trace and meta learning rule is used here to speed up learning, e-completeness of fuzzy rules criterion and temporal-difference (TD) error criterion are considered for rule generation. The DFQL approach has been applied to a real-time control caterpillar robot for the wall following task. Experimental results and comparative studies with the fuzzy Q-learning and continuous-action Q-learning in the wall-following task of mobile robots demonstrate that the proposed DFQL method is superior.

关 键 词:模糊控制 在线自组织 Q强化学习 嵌入式系统 实时控制 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象