检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张英 韦闽峰[2,3,4] 王世会[2,3] 陶磊岩[5] 曹健 张兴[1] ZHANG Ying;WEI Minfeng;WANG Shihui;TAO Leiyan;CAO Jian;ZHANG Xing(School of Software and Microelectronics,Peking University,Beijing,100871,China;Beijing Aerospace Automatic Control Institute,Beijing,100854,China;National Key Laboratory of Science and Technology on Aerospace Intelligent Control,Beijing,100854,China;School of Automation,Beijing Institute of Technology,Beijing,100081,China;Beijing Institute of Remote Sensing Equipment,Beijing,100854,China)
机构地区:[1]北京大学软件与微电子学院,北京100871 [2]北京航天自动控制研究所,北京100854 [3]宇航智能控制技术国家级重点实验室,北京100854 [4]北京理工大学自动化学院,北京100081 [5]北京遥感设备研究所,北京100854
出 处:《西安电子科技大学学报》2020年第2期75-82,共8页Journal of Xidian University
基 金:国家自然科学基金(51877008)。
摘 要:为了提高飞行器控制系统长期在轨飞行的可靠性,提出了一种基于强化学习的多模式控制系统方案。该系统包括传感器模块、控制模块和执行模块。其中,传感器模块用于向控制模块实时输入飞行器敏感的飞行数据,该数据分为可供飞行器控制直接使用的具有历史相关性的多维结构化浮点数据以及某特定传感器独有的物理表征量;控制模块使用实时并行化决策机制,分为输入层、特征抽取层和全连接层;执行模块用于接收控制模块实时输出的驱动数据,包括用于决策的状态最优值和用于评价的动作输出值。系统根据用于决策的回报最优值决定使用哪些具体的执行模块,而某个被选定的具体执行模块的输出值取决于用于评价的动作输出值。该系统使飞行器在多模式输入输出状态下具备15ms快响应,5.23GOPs/sec/W(性能功耗比单位)性能功耗比的能力。In order to improve the long-term in orbit flight reliability of the aircraft control system,a multi-mode control scheme is proposed based on reinforcement learning.This system includes a sensor module,a control module and an execution module.The sensor module is used to input the sensitive flight data of the aircraft to the control module in real time.This data is divided into multidimensional structured floating point data with historical relevance that can be directly used for aircraft control and the unique physical representation quantity of a particular sensor.The control module is divided into an input layer,a feature extraction layer and a full connection layer.The execution module is used to receive the driving data from the control module in real time,which includes the optimal state value for decision-making and the action output value for evaluation.The system decides which specific execution modules to use based on the optimal return value for decision making,with the output value of a selected specific execution module depending on the output value of the action used for evaluation.The system enables the aircraft to complete a long-term orbit ope ration in the multi-mode input and output state with 15ms fast response and 5.23GOP/s/W Performance per Watt.
分 类 号:TN911.22[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28