检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵琛钰 胥彪 宋勋 赵启伦 李爽 ZHAO Chenyu;XU Biao;SONG Xun;ZHAO Qilun;LI Shuang(College of Astronautics,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,Jiangsu,China;Key Laboratory of Space Photoelectric Detection and Perception,Nanjing 211106,Jiangsu,China;Shanghai AVICAS Avionics Systems Co.Ltd.,Shanghai 201100,China;Beijing Institute of Electronic System Engineering,Beijing 100854,China)
机构地区:[1]南京航空航天大学航天学院,江苏南京211106 [2]南京航空航天大学空间光电探测与感知工业和信息化部重点实验室,江苏南京211106 [3]上海民用航空电子系统有限公司,上海201100 [4]北京电子工程总体研究所,北京100854
出 处:《上海航天(中英文)》2024年第6期39-45,共7页Aerospace Shanghai(Chinese&English)
基 金:空间光电探测与感知工业和信息化部重点实验室基金资助(NJ2022025-05)。
摘 要:针对跨域拦截弹在宽速域、大空域飞行面临动力学建模较难、模型未知的问题,提出一种基于数据驱动的在线强化学习姿态控制方法。受零和博弈的启发,将干扰也当作是系统输入的一部分设计性能指标函数。实际拦截弹控制量输入的目的是最小化性能指标函数,提高系统性能,而干扰的作用则相反。然后,通过构建评价网络在线学习获得相应的近似解,并通过更新权值来动态地处理不确定性。与传统依赖模型的在线强化学习求解方法不同,数据驱动的强化学习方法不再需要拦截弹系统的动态模型信息,而是仅通过系统的输入输出数据来驱动网络进行权值在线学习更新。最终,通过仿真验证了该方法的有效性。In order to solve the problem that it is difficult to conduct dynamic modeling for cross-domain interceptors flying in wide-speed and large-space domains and there are no relevant models,a data-driven online reinforcement learning attitude control method is proposed.First,inspired by the zero-sum game,the interference is considered as a part of the system input to design the performance index function.The purpose of the actual interceptor control quantity input is to minimize the performance index function and improve the system performance,while the effect of interference is opposite.Then,the corresponding approximate solution is obtained through online learning by constructing a critic network,and the uncertainty is handled by updating the weights dynamically.Different from the traditional model-based online reinforcement learning solution method,the data-driven reinforcement learning(RL data driven)method no longer requires the dynamic model information of the interceptor system,but only uses the input and output data of the system to drive the network online learning and updating of weights.Finally,the effectiveness of the proposed method is verified by simulation.
关 键 词:跨域拦截弹 在线强化学习 数据驱动 零和博弈 不确定性
分 类 号:V448[航空宇航科学与技术—飞行器设计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200