飞行器强化学习多模在轨控制被引量：1

Aircraft reinforcement learning multi-mode control in orbit

作　　者：张英韦闽峰[2,3,4] 王世会[2,3] 陶磊岩[5] 曹健张兴[1] ZHANG Ying;WEI Minfeng;WANG Shihui;TAO Leiyan;CAO Jian;ZHANG Xing(School of Software and Microelectronics,Peking University,Beijing,100871,China;Beijing Aerospace Automatic Control Institute,Beijing,100854,China;National Key Laboratory of Science and Technology on Aerospace Intelligent Control,Beijing,100854,China;School of Automation,Beijing Institute of Technology,Beijing,100081,China;Beijing Institute of Remote Sensing Equipment,Beijing,100854,China)

机构地区：[1]北京大学软件与微电子学院,北京100871 [2]北京航天自动控制研究所,北京100854 [3]宇航智能控制技术国家级重点实验室,北京100854 [4]北京理工大学自动化学院,北京100081 [5]北京遥感设备研究所,北京100854

出　　处：《西安电子科技大学学报》2020年第2期75-82,共8页Journal of Xidian University

基　　金：国家自然科学基金(51877008)。

摘　　要：为了提高飞行器控制系统长期在轨飞行的可靠性,提出了一种基于强化学习的多模式控制系统方案。该系统包括传感器模块、控制模块和执行模块。其中,传感器模块用于向控制模块实时输入飞行器敏感的飞行数据,该数据分为可供飞行器控制直接使用的具有历史相关性的多维结构化浮点数据以及某特定传感器独有的物理表征量;控制模块使用实时并行化决策机制,分为输入层、特征抽取层和全连接层;执行模块用于接收控制模块实时输出的驱动数据,包括用于决策的状态最优值和用于评价的动作输出值。系统根据用于决策的回报最优值决定使用哪些具体的执行模块,而某个被选定的具体执行模块的输出值取决于用于评价的动作输出值。该系统使飞行器在多模式输入输出状态下具备15ms快响应,5.23GOPs/sec/W(性能功耗比单位)性能功耗比的能力。In order to improve the long-term in orbit flight reliability of the aircraft control system,a multi-mode control scheme is proposed based on reinforcement learning.This system includes a sensor module,a control module and an execution module.The sensor module is used to input the sensitive flight data of the aircraft to the control module in real time.This data is divided into multidimensional structured floating point data with historical relevance that can be directly used for aircraft control and the unique physical representation quantity of a particular sensor.The control module is divided into an input layer,a feature extraction layer and a full connection layer.The execution module is used to receive the driving data from the control module in real time,which includes the optimal state value for decision-making and the action output value for evaluation.The system decides which specific execution modules to use based on the optimal return value for decision making,with the output value of a selected specific execution module depending on the output value of the action used for evaluation.The system enables the aircraft to complete a long-term orbit ope ration in the multi-mode input and output state with 15ms fast response and 5.23GOP/s/W Performance per Watt.

关键词：飞行器控制系统多模式强化学习

分类号：TN911.22[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

飞行器强化学习多模在轨控制被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

飞行器强化学习多模在轨控制 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

飞行器强化学习多模在轨控制被引量：1