检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:崔雅萌 王会霞[1,2] 郑春胜 胡瑞光 CUI Ya-Meng;WANG Hui-Xia;ZHENG Chun-Sheng;HU Rui-Guang(Beijing Aerospace Automatic Control Institute,Beijing 100854,China;National Key Laboratory of Science and Technology on Aerospace Intelligence Control,Beijing 100854,China)
机构地区:[1]北京航天自动控制研究所,北京100854 [2]宇航智能控制技术国家级重点实验室,北京100854
出 处:《指挥与控制学报》2021年第4期403-414,共12页Journal of Command and Control
基 金:国防基础科研基金(JCKY2019203C029)资助。
摘 要:红方单个飞行器面对蓝方多个拦截器时,受到飞行器个体能力以及突防手段的限制,难以突防成功.针对此问题,基于强化学习设计了一种可躲避蓝方多个拦截器的智能博弈策略.建立红方飞行器的数学模型,包含抛撒诱饵、机动调整、姿态调整等行为;建立蓝方飞行器数学模型,蓝方采用比例导引法拦截红方;设计了基于深度确定性策略梯度的飞行器逃逸算法,为了加快智能体的学习速度和算法收敛速度,用先验知识进行预训练和优先经验回放机制相结合的方式进行算法训练.仿真结果表明该算法可使红方飞行器面对蓝方多个拦截器时成功逃逸.Due to the limitation of individual capability and penetration means,a single red aircraft is difficult to penetrate successfully from multiple blue interceptors.To solve this problem,an intelligent game strategy based on reinforcement learning is designed.Firstly,the mathematical model of the red aircraft is established,including throwing bait,maneuvering adjustment,attitude adjustment,etc.Secondly,the mathematical model of the blue aircraft is established,and the blue intercepts the red with the proportional guidance method.Next,a flight escape algorithm based on deep deterministic policy gradient is designed.Then,by combining the prior knowledge pre-training and the preferred experience playback mechanism,the learning speed and the convergence speed of the algorithm are accelerated.The simulation results show that the proposed algorithm can make the red aircraft penetrate successfully when facing the blue interceptors.
关 键 词:飞行器攻防对抗 行为模型 目标分配 深度强化学习 先验知识
分 类 号:O225[理学—运筹学与控制论] E91[理学—数学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15