基于深度强化学习的空海联合作战智能决策新方法  被引量:1

A New Intelligent Decision-Making Method for Air-Sea Joint Operation Based on Deep Reinforcement Learning

在线阅读下载全文

作  者:宋晓程 冯舒婷 李陟 贾政轩 周国进 叶东[3] SONG Xiaocheng;FENG Shuting;LI Zhi;JIA Zhengxuan;ZHOU Guojin;YE Dong(Beijing Institute of Electronic System Engineering,Beijing 100854,P.R.China;Beijing Huashu Defense Technology Co.Ltd,Beijing 100084,P.R.China;Research Center of Satellite Technology,Harbin Institute of Technology,Harbin 150080,P.R.China)

机构地区:[1]北京电子工程总体研究所,中国北京100854 [2]北京华戍防务技术有限公司,中国北京100084 [3]哈尔滨工业大学卫星技术研究所,中国哈尔滨150080

出  处:《Transactions of Nanjing University of Aeronautics and Astronautics》2023年第1期25-36,共12页南京航空航天大学学报(英文版)

基  金:supported by the National Natural Science Foundation of China(Nos.62073102,62203145);the China Postdoctoral Science Foundation(No.2022M710948)。

摘  要:针对空海联合作战中多装备复杂作战场景不确定性高的难点,提出了一种基于深度强化学习的空海联合作战智能决策新方法。为了统一表示复杂网络的输入、输出及其对应关系,提出了综合利用感知机、深度长短时记忆网络及actor-critic结构的方法。针对策略网络学习过程中的不稳定性及近似策略优化算法的缺陷,提出了改进的近似策略优化算法;针对策略网络自学习过程中对手策略的易变性,提出了基于模型性能和模型多样性的新策略以对于基线策略模型进行选择。实验结果表明,该方法在空海联合作战决策中是有效和稳定的。在第四届中国指控学会兵棋推演专项赛中,本方法在百余轮与规则决策算法及人类的对抗中胜率达到97%,较规则决策算法提升20%左右。Aiming at the difficulty of air-sea joint operation in complex multi-equipment combat with high uncertainty,a new intelligent decision-making method for air-sea joint operation based on deep reinforcement learning is proposed.To uniformly represent the input and output of complex networks and their corresponding relations,various networks are utilized,e.g.,perceptron,deep long-short term memory network and actor critical structure.Aiming at the instability of policy network learning process and the defects of the proximal policy optimization(PPO)algorithm,an improved proximate policy optimization algorithm is proposed.To enhance the variability of opponent’s strategy in the process of policy network self-learning,a baseline policy model selection method based on model performance and model diversity is proposed.The experiments demonstrate that the proposed method is effective and stable in air-sea joint operation decision.In the 4th Wargaming Competition hosted by Chinese Institute of Command and Control,the winning rate in more than 100 rounds against regular decision-making algorithm and human confrontation was 97%,which was about 20%higher than that of regular decision-making algorithms.

关 键 词:空海联合作战 深度强化学习 近似策略优化 智能决策 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象