检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:宋晓程 冯舒婷 李陟 贾政轩 周国进 叶东[3] SONG Xiaocheng;FENG Shuting;LI Zhi;JIA Zhengxuan;ZHOU Guojin;YE Dong(Beijing Institute of Electronic System Engineering,Beijing 100854,P.R.China;Beijing Huashu Defense Technology Co.Ltd,Beijing 100084,P.R.China;Research Center of Satellite Technology,Harbin Institute of Technology,Harbin 150080,P.R.China)
机构地区:[1]北京电子工程总体研究所,中国北京100854 [2]北京华戍防务技术有限公司,中国北京100084 [3]哈尔滨工业大学卫星技术研究所,中国哈尔滨150080
出 处:《Transactions of Nanjing University of Aeronautics and Astronautics》2023年第1期25-36,共12页南京航空航天大学学报(英文版)
基 金:supported by the National Natural Science Foundation of China(Nos.62073102,62203145);the China Postdoctoral Science Foundation(No.2022M710948)。
摘 要:针对空海联合作战中多装备复杂作战场景不确定性高的难点,提出了一种基于深度强化学习的空海联合作战智能决策新方法。为了统一表示复杂网络的输入、输出及其对应关系,提出了综合利用感知机、深度长短时记忆网络及actor-critic结构的方法。针对策略网络学习过程中的不稳定性及近似策略优化算法的缺陷,提出了改进的近似策略优化算法;针对策略网络自学习过程中对手策略的易变性,提出了基于模型性能和模型多样性的新策略以对于基线策略模型进行选择。实验结果表明,该方法在空海联合作战决策中是有效和稳定的。在第四届中国指控学会兵棋推演专项赛中,本方法在百余轮与规则决策算法及人类的对抗中胜率达到97%,较规则决策算法提升20%左右。Aiming at the difficulty of air-sea joint operation in complex multi-equipment combat with high uncertainty,a new intelligent decision-making method for air-sea joint operation based on deep reinforcement learning is proposed.To uniformly represent the input and output of complex networks and their corresponding relations,various networks are utilized,e.g.,perceptron,deep long-short term memory network and actor critical structure.Aiming at the instability of policy network learning process and the defects of the proximal policy optimization(PPO)algorithm,an improved proximate policy optimization algorithm is proposed.To enhance the variability of opponent’s strategy in the process of policy network self-learning,a baseline policy model selection method based on model performance and model diversity is proposed.The experiments demonstrate that the proposed method is effective and stable in air-sea joint operation decision.In the 4th Wargaming Competition hosted by Chinese Institute of Command and Control,the winning rate in more than 100 rounds against regular decision-making algorithm and human confrontation was 97%,which was about 20%higher than that of regular decision-making algorithms.
关 键 词:空海联合作战 深度强化学习 近似策略优化 智能决策
分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229