基于强化学习的多对多拦截目标分配方法  被引量:2

Reinforcement Learning-Based Target Assignment Method for Many-to-Many Interceptions

在线阅读下载全文

作  者:郭建国[1] 胡冠杰 许新鹏 刘悦 曹晋 GUO Jianguo;HU Guanjie;XU Xinpeng;LIU Yue;CAO Jin(Institute of Precision Guidance and Control,School of Astronautics,Northwestern Polytechnical University,Xi’an 710072,Shaanxi,China;Shanghai Electro-Mechanical Engineering Institute,Shanghai 201109,China)

机构地区:[1]西北工业大学航天学院精确制导与控制研究所,陕西西安710072 [2]上海机电工程研究所,上海201109

出  处:《空天防御》2024年第1期24-31,共8页Air & Space Defense

基  金:国家自然科学基金(61973254,92271109,52272404)。

摘  要:针对空中对抗环境中多对多拦截的武器目标分配问题,提出了一种基于强化学习的多目标智能分配方法。在多对多拦截交战场景下,基于交战态势评估构建了目标分配的数学模型。通过引入目标威胁程度和拦截有效程度的概念,充分反映了各目标的拦截紧迫性和各拦截器的拦截能力表征,从而全面评估了攻防双方的交战态势。在目标分配模型的基础上,将目标分配问题构建为马尔可夫决策过程,并采用基于深度Q网络的强化学习算法训练求解。依靠环境交互下的自学习和奖励机制,有效实现了最优分配方案的动态生成。通过数学仿真构建多对多拦截场景,并验证了该方法的有效性,经训练后的目标分配方法能够满足多对多拦截中连续动态的任务分配要求。Aiming at the issue of weapon target assignment for a many-to-many interception in the air confrontation environment,this study has proposed a multi-target intelligent assignment method based on reinforcement learning.Under the many-to-many interception engagement scenario,a mathematical model of target assignment was established based on the engagement posture evaluation.By introducing the concepts of target threat degree and interception effectiveness degree,the interception urgency of each target and the interception capability characterization of each interceptor were fully reflected,allowing a comprehensive evaluation of the engagement posture of the attacking and defending sides.Based on the target assignment model,the target assignment issue was built up using a Markov decision process and was trained to be solved by a reinforcement learning algorithm using deep Q-network.Relying on the self-learning and reward mechanism under environment interaction,the dynamic generation of optimal assignment schemes was effectively realized.A many-to-many interception scenario was created and its effectiveness was verified through mathematical simulation,and the result shows that the trained target assignment method satisfies the requirements of continuous and dynamic task assignment in many-tomany interception.

关 键 词:武器目标分配 多目标拦截 态势评估 强化学习 深度Q网络 

分 类 号:E927[军事—军事装备学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象