检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:KARA Ali Devran BAYRAKTAR Erhan YUKSEL Serdar
机构地区:[1]Department of Mathematics,Florida State University,FL 32306-2400,USA [2]Department of Mathematics,University of Michigan,MI 48109,USA [3]Department of Mathematics and Statistics,Queen's University,ON K7L 3N6,Canada
出 处:《Journal of Systems Science & Complexity》2025年第1期238-270,共33页系统科学与复杂性学报(英文版)
基 金:partially supported by the National Science Foundation under Grant No.DMS-2106556;by the Susan M.Smith chair;partially supported by the Natural Sciences and Engineering Research Council(NSERC)of Canada。
摘 要:The authors study an approximation method for partially observed Markov decision processes(POMDPs)with continuous spaces.Belief MDP reduction,which has been the standard approach to study POMDPs requires rigorous approximation methods for practical applications,due to the state space being lifted to the space of probability measures.Generalizing recent work,in this paper the authors present rigorous approximation methods via discretizing the observation space and constructing a fully observed finite MDP model using a finite length history of the discrete observations and control actions.The authors show that the resulting policy is near-optimal under some regularity assumptions on the channel,and under certain controlled filter stability requirements for the hidden state process.The authors also provide a Q learning algorithm that uses a finite memory of discretized information variables,and prove its convergence to the optimality equation of the finite fully observed MDP constructed using the approximation method.
关 键 词:Filter stability POMDP reinforcement learning stochastic control
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.161