基于因素化表示的TD(λ)算法

Algorithm of TD(λ) Based on Factored Representation

出　　处：《计算机工程》2009年第13期190-192,195,共4页Computer Engineering

基　　金：湖南省教委基金资助项目(07C083)

摘　　要：提出一种新的基于因素法方法的TD(λ)算法。其基本思想是状态因素化表示,通过动态贝叶斯网络表示Markov决策过程(MDP)中的状态转移概率函数,结合决策树表示TD(λ)算法中的状态值函数,降低状态空间的搜索与计算复杂度,因而适用于求解大状态空间的MDPs问题,实验证明该表示方法是有效的。This paper proposes a new algorithm of TD（λ） based on factored representation. The main principle of the algorithm is that states are factored representation, and makes use of Dynamic Bayesian Networks（DBNs） to represent the conditional probability distributions in Markov Decision Processes（MDPs）, together with decision-trees representation of value function in the algorithm of TD（λ） to lower the state space exploration and computation complexity. Therefore the algorithm is a promise for solving large-scale MDPs problems which are of a huge state space. Experiments demonstrates the validity of this representation method.

关键词：因素化表示动态贝叶斯网络决策树 TD(λ)算法

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于因素化表示的TD(λ)算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于因素化表示的TD(λ)算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索