检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈灵敏 冯宇 李永强[1] CHEN Ling-Min;FENG Yu;LI Yong-Qiang(College of Information Engineering,Zhejiang University of Technology,Hangzhou 313000)
出 处:《自动化学报》2024年第4期828-840,共13页Acta Automatica Sinica
基 金:国家自然科学基金(61973276,62073294);浙江省自然科学基金(LZ21F030003)资助。
摘 要:追逃问题的研究在对抗、追踪以及搜查等领域极具现实意义.借助连续随机博弈与马尔科夫决策过程(Markov decision process, MDP),研究使用测量距离求解多对一追逃问题的最优策略.在此追逃问题中,追捕群体仅领导者可测量与逃逸者间的相对距离,而逃逸者具有全局视野.追逃策略求解被分为追博弈与马尔科夫决策两个过程.在求解追捕策略时,通过分割环境引入信念区域状态以估计逃逸者位置,同时使用测量距离对信念区域状态进行修正,构建起基于信念区域状态的连续随机追博弈,并借助不动点定理证明了博弈平稳纳什均衡策略的存在性.在求解逃逸策略时,逃逸者根据全局信息建立混合状态下的马尔科夫决策过程及相应的最优贝尔曼方程.同时给出了基于强化学习的平稳追逃策略求解算法,并通过案例验证了该算法的有效性.The pursuit-evasion problem is of great importance in the fields of confrontation,tracking and searching.In this paper,we are focused on the study of optimal strategies for solving the multi-pursuits and single-evader problem with only measured distances within the framework of continuous stochastic game and Markov decision process(MDP).In such problem,only the leader of pursuits can measure its relative distance with respect to the evader,while the evader has a global view.The strategies of the pursuits and evader are established via two steps:The pursuit game and the MDP.For the pursuits'strategy,the belief region state is introduced by partitioning the environment to estimate the evader's position,and the belief region state is further corrected by using the measured distances.A continuous stochastic pursuit game is then formed based on the belief region state,and the existence of stationary Nash equilibrium strategies is shown through the fixed-point theorem.For the evader's strategy,an MDP with the global states is established and the underlying optimal Bellman equation is devised.Moreover,a reinforcement learning based algorithm is presented for stationary pursuit-evasion strategies computation,and an example is also included to exhibit the effectiveness of the current method.
关 键 词:追逃问题 信念区域状态 连续随机博弈 马尔科夫决策过程 强化学习
分 类 号:O225[理学—运筹学与控制论] TP18[理学—数学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49