基于深度强化学习的无人机集群协同作战决策方法  被引量:3

UAV cluster cooperative combat decision-making method based on deep reinforcement learning

在线阅读下载全文

作  者:赵琳 吕科 郭靖 宏晨 向贤财 薛健 王泳 ZHAO Lin;LYU Ke;GUO Jing;HONG Chen;XIANG Xiancai;XUE Jian;WANG Yong(School of Engineering Science,University of Chinese Academy of Sciences,Beijing 100049,China;College of Electronic and Information Engineering,Shenyang Aerospace University,Shenyang Liaoning 110136,China;College of Robotics,Beijing Union University,Beijing 100101,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院大学工程科学学院,北京100049 [2]沈阳航空航天大学电子信息工程学院,沈阳110136 [3]北京联合大学机器人学院,北京100101 [4]中国科学院大学人工智能学院,北京100049

出  处:《计算机应用》2023年第11期3641-3646,共6页journal of Computer Applications

基  金:国家重点研发计划项目(2018AAA0100804)。

摘  要:在无人机(UAV)集群攻击地面目标时,UAV集群将分为两个编队:主攻目标的打击型UAV集群和牵制敌方的辅助型UAV集群。当辅助型UAV集群选择激进进攻或保存实力这两种动作策略时,任务场景类似于公共物品博弈,此时合作者的收益小于背叛者。基于此,提出一种基于深度强化学习的UAV集群协同作战决策方法。首先,通过建立基于公共物品博弈的UAV集群作战模型,模拟智能化UAV集群在合作中个体与集体间的利益冲突问题;其次,利用多智能体深度确定性策略梯度(MADDPG)算法求解辅助UAV集群最合理的作战决策,从而以最小的损耗代价实现集群胜利。在不同数量UAV情况下进行训练并展开实验,实验结果表明,与IDQN(Independent Deep QNetwork)和ID3QN(Imitative Dueling Double Deep Q-Network)这两种算法的训练效果相比,所提算法的收敛性最好,且在4架辅助型UAV情况下胜率可达100%,在其他UAV数情况下也明显优于对比算法。When the Unmanned Aerial Vehicle(UAV)cluster attacks ground targets,it will be divided into two formations:a strike UAV cluster that attacks the targets and a auxiliary UAV cluster that pins down the enemy.When auxiliary UAVs choose the action strategy of aggressive attack or saving strength,the mission scenario is similar to a public goods game where the benefits to the cooperator are less than those to the betrayer.Based on this,a decision method for cooperative combat of UAV clusters based on deep reinforcement learning was proposed.First,by building a public goods game based UAV cluster combat model,the interest conflict problem between individual and group in cooperation of intelligent UAV clusters was simulated.Then,Muti-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm was used to solve the most reasonable combat decision of the auxiliary UAV cluster to achieve cluster victory with minimum loss cost.Training and experiments were performed under conditions of different numbers of UAV.The results show that compared to the training effects of two algorithms—IDQN(Independent Deep Q-Network)and ID3QN(Imitative Dueling Double Deep Q-Network),the proposed algorithm has the best convergence,its winning rate can reach 100%with four auxiliary UAVs,and it also significantly outperforms the comparison algorithms with other UAV numbers.

关 键 词:无人机 多集群 公共物品博弈 多智能体深度确定性策略梯度 协同作战决策方法 

分 类 号:V279.2[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象