一种改进博弈学习的无人机集群协同围捕方法被引量：2

Improved Game Learning Method for UAV Swarm Cooperative Hunting

作　　者：刘菁华翔[1,2] 张金金[1] LIU Jing;HUA Xiang;ZHANG Jinjin(School of Defence Science and Technology,Xi’an Technological University,Xi’an 710021,China;School of Electronic Information Engineering,Xi’an Technological University,Xi’an 710021,China)

机构地区：[1]西安工业大学兵器科学与技术学院,西安710021 [2]西安工业大学电子信息工程学院,西安710021

出　　处：《西安工业大学学报》2023年第3期277-286,共10页Journal of Xi’an Technological University

基　　金：陕西省重点研发计划项目(2023 YBGY 227);西安市科技计划项目(2022JH RYFW 0138)。

摘　　要：针对无人机集群对单智能化目标协同围捕问题,文中提出一种改进博弈学习的无人机集群协同围捕方法。根据集群和目标的运动学关系,建立了一种结合博弈论与阿波罗尼斯圆的协同围捕模型;依据集群之间的相互合作关系和追逃双方的博弈关系,基于Q Learning算法和学习到的奖赏均值动态调整贪婪因子以构建和完善状态动作矩阵;根据状态动作矩阵求解支付矩阵的纳什均衡解,完成集群对单目标的协同围捕。研究结果表明:通过该协同围捕方法各围捕无人机获得的平均奖赏值较传统Q Learning算法分别提高了48%,32.4%,50.8%,完成围捕任务所需的平均行走步数减少了58.7%,能够有效对单目标进行围捕,且围捕时效性更强。In response to cooperative hunting of a single intelligent target by UAV swarm,the paper presents a cooperative hunting method based on improved game learning.According to the kinematic relationship between the swarm and the target,a cooperative hunting model is established based on game theory and Apollonis.In accordance with the cooperative relationship between the swarms and the game relationship between the chasing and escaping parties,the greed factor is dynamically adjusted based on the Q Learning algorithm and the learned reward mean so as to construct and perfect the state action matrix.According to the Nash equilibrium solution which is obtained by solving the payment matrix based on the state action matrix,the cooperative hunting of a single target is completed by the swarm.The simulation results show that the average reward value obtained by each UAV by this collaborative hunting method is 48%,32.4%,and 50.8%higher than that by the conventional Q Learning algorithm,respectively,and that the average number of walking steps required to complete a roundup task is reduced by 58.7%.It is concluded that the cooperative hunting method can effectively capture a single target with higher time efficiency.

关键词：无人机集群协同围捕博弈论阿波罗尼斯圆 Q Learning

分类号：TP242[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进博弈学习的无人机集群协同围捕方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进博弈学习的无人机集群协同围捕方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种改进博弈学习的无人机集群协同围捕方法被引量：2