检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国科学技术大学自动化系,安徽合肥230027
出 处:《中国科学技术大学学报》2001年第5期549-557,共9页JUSTC
基 金:国家自然科学基金 (6 99740 37);国家高性能计算基金 (0 0 2 0 8)资助项目
摘 要:论文在Markov性能势理论基础上 ,研究了Markov控制过程在神经元网络等逼近结构表示的随机平稳策略作用下的仿真优化算法 ;分析了它们在一个无限长的样本轨道上以概率 1的收敛性 ;并给出了一个三Motivated by the needs of on line optimization of real word engineering systems, single sample path based optimization algorithms were studied for Markov control processes controlled by randomized stationary policies. The concept of Markov performance potential is introduced, and the policies can be represented by some approximate architectures such as neural networks. Unlike traditional computation based approaches, the policy parameters can be iterated and an optimal (or suboptimal) randomized stationary policy can be found according to a sample path obtained by observing the operation of a real system.This optimization method is a form of neuro dynamic programming methodology. The algorithms provided here have good adaptability as they can be used in different real systems, with a suitable choice of the parameters in the algorithms. Finally, the convergence of the algorithms with probability one on an infinite sample path is considered, and a numerical example for a three state controlled Markov chain is provided.
关 键 词:Markov性能势理论 MARKOV控制过程 随机平稳策略 样本轨道 神经元动态规划 随机决策问题
分 类 号:O231.3[理学—运筹学与控制论] O221.3[理学—数学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49