不平衡数据集下基于自适应加权Bagging-GBDT算法的磁盘故障预测模型被引量：8

Prediction model of disk failure based on adaptive weighted bagging-GBDT algorithm under imbalanced dataset

作　　者：李新鹏高欣[2] 何杨阎博孙汉旭[2] 李军良徐建航刘震宇[5] 庞博[5] LI Xin-peng;GAO Xin;HE Yang;YAN Bo;SUN Han-xu;LI Jun-liang;XU Jian-hang;LIU Zhen-yu;PANG Bo(State Grid Corporation of China,Beijing 100031,China;College of Automation,Beijing University of Posts and Telecommunications,Beijing 100876,China;State Grid Jibei Electric Power Company Limited,Beijing 100054,China;Nari Group(State Grid Electric Power Research Institute)Corporation,Beijing 100192,China;State Grid Jibei Electric Power Company Limited Chengde Power Supply Company,Chengde 067000,China)

机构地区：[1]国家电网有限公司,北京100031 [2]北京邮电大学自动化学院,北京100876 [3]国网冀北电力有限公司,北京100054 [4]南瑞集团(国网电力科学研究院)有限公司,北京100192 [5]国网冀北电力有限公司承德供电公司,河北承德067000

出　　处：《微电子学与计算机》2020年第3期14-19,共6页Microelectronics & Computer

基　　金：国家电网公司总部科技项目-调度控制系统测试验证与一体化运维关键技术研究与应用(DZ71-18-010)。

摘　　要：针对磁盘数据集中正负样本数目严重不平衡导致基于机器学习的分类算法易出现故障预测准确率低的问题,本文提出一种基于自适应加权Bagging-GBDT算法的磁盘故障预测模型.首先,提出基于聚类的分层欠采样方法对健康磁盘样本进行多次抽样,解决随机欠采样方法易丢弃潜在有用样本的问题;其次,将每次采样后样本与全部故障磁盘样本组合得到多个样本子集,通过训练这些子集建立多个预测精度较高的GBDT子分类模型;最后,根据待测点邻域样本类别自适应确定各子模型权重,据此通过加权硬投票集成最终的磁盘故障预测模型.在8组KEEL不平衡数据集上实验结果表明,与现有典型不平衡学习算法相比,少数类的召回率平均提升了9.46%;同时在磁盘公开数据集和某调度系统磁盘数据上对比验证了该方法在故障预测率上的先进性.Aiming at the problem that the classification algorithm based on machine learning is prone to low accuracy of fault prediction due to the serious imbalance between the number of positive and negative samples in the disk dataset,this paper proposes a disk fault prediction model based on adaptive weighted Bagging-GBDT algorithm.Firstly,a hierarchical under-sampling method based on clustering algorithm is proposed to sample healthy disk samples several times to solve the problem that the random undersampling method is easy to discard potentially useful samples.Secondly,each sample after sampling is combined with all the failed disk samples to obtain several subsets.By training these subsets,a number of GBDT sub-classification models with higher prediction accuracy are established.Finally,the weights of each sub-model are adaptively determined through the neighborhood sample label of the test sample,and the final disk failure prediction model is integrated by weighted hard voting.The experimental results on 8 sets of KEEL imbalanced datasets show that the recall of the negative is increased by an average of 9.46%compared with the existing typical imbalanced learning algorithm.At the same time,the advancement of the method in the fault prediction rate is verified on disk public datasets and the disk data of a scheduling system.

关键词：磁盘故障预测不平衡数据集分层欠采样 Bagging-GBDT 自适应加权

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

不平衡数据集下基于自适应加权Bagging-GBDT算法的磁盘故障预测模型被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

不平衡数据集下基于自适应加权Bagging-GBDT算法的磁盘故障预测模型 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

不平衡数据集下基于自适应加权Bagging-GBDT算法的磁盘故障预测模型被引量：8