基于强化学习的舰船目标跟踪有限理性博弈算法研究

Research on Bounded Rational Game Algorithm for Ship Target Tracking Based on Reinforcement Learning

作　　者：陈素霞徐清雯刘久富[2] 解晖刘向武 CHEN Suxia;XU Qingwen;LIU Jiufu;XIE Hui;LIU Xiangwu(Department of Computer and Art Design,Henan Light Industry Vocational College,Zhengzhou 450008,China;College of Automation,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)

机构地区：[1]河南轻工职业学院计算机与艺术设计系,郑州450008 [2]南京航空航天大学自动化学院,南京211106

出　　处：《计算机工程与应用》2024年第20期116-123,共8页Computer Engineering and Applications

基　　金：国家自然科学基金(61473144)。

摘　　要：针对现实中的决策者并非总能完全理性分析问题的情况,提出有限理性下的追逃博弈算法。建立追逃博弈模型,先求解完全理性下博弈双方的鞍点策略。引入有限理性level-k模型,对追击者和躲避者思考策略的层次进行结构性假设,允许追逃双方具备不同的策略推理能力,并给出相应等级的值函数和策略,策略满足HJI方程。随着等级的增加,策略最终会趋于纳什均衡。由于HJI方程难以直接求解,基于强化学习的actor-critic算法进行求解,设计算法使追击者能够估算出躲避者的思维等级并采取合适的策略。以舰船为对象,将舰船运动简化为二维的数学模型,建立舰船追逃博弈模型,对其进行算法仿真验证。Since decision-makers in reality are not always able to analyze problems perfectly rationally,a pursuit evasion game algorithm based on bounded rationality is proposed.It establishes a pursuit evasion game model and first solves the saddle point strategies of the two players under perfect rationality.Introducing the bounded rationality level-k model,a structural assumption is made on the level of thinking strategies for pursuers and evaders.It allows both parties to have different strategic reasoning abilities,and gives corresponding levels’value functions and strategies,which satisfy the HJI equation.As the level increases,the strategy will eventually tend towards Nash equilibrium.Due to the difficulty in directly solving the HJI equation,an actor critic algorithm based on reinforcement learning is used to solve it.The algorithm is designed to enable pursuers to estimate the thinking level of evaders and adopt appropriate strategies.Simplify the motion of a ship as a two-dimensional mathematical model,this paper establishes a ship pursuit and evasion game model,and performs algorithm simulation verification on it.

关键词：追逃博弈目标跟踪强化学习有限理性

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的舰船目标跟踪有限理性博弈算法研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的舰船目标跟踪有限理性博弈算法研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索