基于距离信息的追逃策略:信念状态连续随机博弈  被引量:1

Distance Information Based Pursuit-evasion Strategy: Continuous Stochastic Game With Belief State

在线阅读下载全文

作  者:陈灵敏 冯宇 李永强[1] CHEN Ling-Min;FENG Yu;LI Yong-Qiang(College of Information Engineering,Zhejiang University of Technology,Hangzhou 313000)

机构地区:[1]浙江工业大学信息工程学院,杭州313000

出  处:《自动化学报》2024年第4期828-840,共13页Acta Automatica Sinica

基  金:国家自然科学基金(61973276,62073294);浙江省自然科学基金(LZ21F030003)资助。

摘  要:追逃问题的研究在对抗、追踪以及搜查等领域极具现实意义.借助连续随机博弈与马尔科夫决策过程(Markov decision process, MDP),研究使用测量距离求解多对一追逃问题的最优策略.在此追逃问题中,追捕群体仅领导者可测量与逃逸者间的相对距离,而逃逸者具有全局视野.追逃策略求解被分为追博弈与马尔科夫决策两个过程.在求解追捕策略时,通过分割环境引入信念区域状态以估计逃逸者位置,同时使用测量距离对信念区域状态进行修正,构建起基于信念区域状态的连续随机追博弈,并借助不动点定理证明了博弈平稳纳什均衡策略的存在性.在求解逃逸策略时,逃逸者根据全局信息建立混合状态下的马尔科夫决策过程及相应的最优贝尔曼方程.同时给出了基于强化学习的平稳追逃策略求解算法,并通过案例验证了该算法的有效性.The pursuit-evasion problem is of great importance in the fields of confrontation,tracking and searching.In this paper,we are focused on the study of optimal strategies for solving the multi-pursuits and single-evader problem with only measured distances within the framework of continuous stochastic game and Markov decision process(MDP).In such problem,only the leader of pursuits can measure its relative distance with respect to the evader,while the evader has a global view.The strategies of the pursuits and evader are established via two steps:The pursuit game and the MDP.For the pursuits'strategy,the belief region state is introduced by partitioning the environment to estimate the evader's position,and the belief region state is further corrected by using the measured distances.A continuous stochastic pursuit game is then formed based on the belief region state,and the existence of stationary Nash equilibrium strategies is shown through the fixed-point theorem.For the evader's strategy,an MDP with the global states is established and the underlying optimal Bellman equation is devised.Moreover,a reinforcement learning based algorithm is presented for stationary pursuit-evasion strategies computation,and an example is also included to exhibit the effectiveness of the current method.

关 键 词:追逃问题 信念区域状态 连续随机博弈 马尔科夫决策过程 强化学习 

分 类 号:O225[理学—运筹学与控制论] TP18[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象