基于贝叶斯网络强化学习的复杂装备维修排故策略生成

Complex equipment troubleshooting strategy generation based on Bayesian networks and reinforcement learning

作　　者：刘宝鼎于劲松[1] 韩丹阳唐荻音[1] 李鑫[3] LIU Baoding;YU Jinsong;HAN Danyang;TANG Diyin;LI Xin(School of Automation Science and Electrical Engineering,Beihang University,Beijing 100191,China;School of Instrumentation and Optoelectronic Engineering,Beihang University,Beijing 100191,China;China Academy of Launch Vehicle Technology,Beijing 100076,China)

机构地区：[1]北京航空航天大学自动化科学与电气工程学院,北京100191 [2]北京航空航天大学仪器科学与光电工程学院,北京100191 [3]中国运载火箭技术研究院,北京100076

出　　处：《北京航空航天大学学报》2024年第4期1354-1364,共11页Journal of Beijing University of Aeronautics and Astronautics

基　　金：国家重点研发计划(2018YFB1403300);国家自然科学基金(51875018,71701008)。

摘　　要：为解决传统启发式维修排故决策方法决策时间长、生成策略总成本高的问题,提出一种基于贝叶斯网络(BN)结合强化学习(RL)进行复杂装备维修排故策略生成方法。为更好地利用复杂装备模型知识,使用BN进行维修排故知识表述,并且为更加贴近复杂装备实际情况,依据故障模式、影响和危害性分析(FMECA)的故障概率,经合理转化后作为BN的先验概率;为使用RL的决策过程生成维修排故策略,提出一种维修排故决策问题转化为RL问题的方法;为更好地求解转化得到的强化学习问题,引入观测-修复动作对(O-A)以减小问题规模,并设置动作掩码处理动态动作空间。仿真结果表明:在统一的性能指标下,所提BN-RL方法较传统方法获得更高的指标值,证明该方法的有效性和优越性。To shorten the time spent and reduce the troubleshooting cost of traditional heuristic methods,a method of generating a troubleshooting strategy based on reinforcement learning(RL)and Bayesian networks(BN)is proposed for complex equipment.BN is used for the expression of knowledge to make better use of model knowledge of complex equipment.To get closer to the real scenario,the fault probability in the failure mode,effect,and critical analysis(FMECA)of complex equipment is converted and used as a prior probability in BN.A paradigm of converting troubleshooting problems into RL problems is proposed to generate a troubleshooting strategy by using the decision process of RL.The observation-action pair(O-A)is introduced to reduce the scale of the RL problem and the action masking is set to deal with dynamic action space.Simulation findings demonstrate the superiority of the proposed BN-RL method by demonstrating its remarkable performances compared to standard heuristic methods based on the proposed metrics.

关键词：强化学习贝叶斯网络维修排故策略生成复杂装备动态动作空间

分类号：TP206.3[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于贝叶斯网络强化学习的复杂装备维修排故策略生成

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于贝叶斯网络强化学习的复杂装备维修排故策略生成

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索