基于机器学习的大规模并行计算机系统硬件故障检测分析  

Hardware fault detection and analysis of large-scale parallel computer systems based on machine learning

在线阅读下载全文

作  者:刘照霞 LIU Zhaoxia(Convenience Service Center in Xujiahu Town,Yishui County,Linyi,Shandong 276402,China)

机构地区:[1]沂水县许家湖镇便民服务中心,山东临沂276402

出  处:《计算机应用文摘》2023年第15期94-97,共4页Chinese Journal of Computer Application

摘  要:作为多个领域重要的生产工具,计算机若出现硬件故障,则会直接影响其工作状态,因此需要对这方面开展详细研究。文章首先将大规模并行计算机系统硬件故障检测作为研究对象,构建硬件故障检测模型,再探究硬件故障分析原理与特征选择过程,提出几种常见的基于机器学习的故障检测算法,最后对不同故障检测算法的实验结果进行详细分析,旨在提升大规模并行计算机系统硬件故障检测效率,助力相关领域的发展。As an important production tool in multiple fields,if a computer experiences hardware failures,it will directly affect its working status.Therefore,detailed research is needed in this regard.First,the paper takes hardware fault detection of Massively parallel computer system as the research object,builds a hardware fault detection model,then explores the hardware fault analysis principle and feature selection process,proposes several common fault detection algorithms based on machine learning,and finally analyzes the experimental results of different fault detection algorithms in detail,aiming to improve the hardware fault detection efficiency of massively parallel computer system,Assist in the development of various fields.

关 键 词:机器学习 并行计算机系统 硬件故障 故障检测 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象