检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李新鹏 高欣[2] 何杨 阎博 孙汉旭[2] 李军良 徐建航 刘震宇[5] 庞博[5] LI Xin-peng;GAO Xin;HE Yang;YAN Bo;SUN Han-xu;LI Jun-liang;XU Jian-hang;LIU Zhen-yu;PANG Bo(State Grid Corporation of China,Beijing 100031,China;College of Automation,Beijing University of Posts and Telecommunications,Beijing 100876,China;State Grid Jibei Electric Power Company Limited,Beijing 100054,China;Nari Group(State Grid Electric Power Research Institute)Corporation,Beijing 100192,China;State Grid Jibei Electric Power Company Limited Chengde Power Supply Company,Chengde 067000,China)
机构地区:[1]国家电网有限公司,北京100031 [2]北京邮电大学自动化学院,北京100876 [3]国网冀北电力有限公司,北京100054 [4]南瑞集团(国网电力科学研究院)有限公司,北京100192 [5]国网冀北电力有限公司承德供电公司,河北承德067000
出 处:《微电子学与计算机》2020年第3期14-19,共6页Microelectronics & Computer
基 金:国家电网公司总部科技项目-调度控制系统测试验证与一体化运维关键技术研究与应用(DZ71-18-010)。
摘 要:针对磁盘数据集中正负样本数目严重不平衡导致基于机器学习的分类算法易出现故障预测准确率低的问题,本文提出一种基于自适应加权Bagging-GBDT算法的磁盘故障预测模型.首先,提出基于聚类的分层欠采样方法对健康磁盘样本进行多次抽样,解决随机欠采样方法易丢弃潜在有用样本的问题;其次,将每次采样后样本与全部故障磁盘样本组合得到多个样本子集,通过训练这些子集建立多个预测精度较高的GBDT子分类模型;最后,根据待测点邻域样本类别自适应确定各子模型权重,据此通过加权硬投票集成最终的磁盘故障预测模型.在8组KEEL不平衡数据集上实验结果表明,与现有典型不平衡学习算法相比,少数类的召回率平均提升了9.46%;同时在磁盘公开数据集和某调度系统磁盘数据上对比验证了该方法在故障预测率上的先进性.Aiming at the problem that the classification algorithm based on machine learning is prone to low accuracy of fault prediction due to the serious imbalance between the number of positive and negative samples in the disk dataset,this paper proposes a disk fault prediction model based on adaptive weighted Bagging-GBDT algorithm.Firstly,a hierarchical under-sampling method based on clustering algorithm is proposed to sample healthy disk samples several times to solve the problem that the random undersampling method is easy to discard potentially useful samples.Secondly,each sample after sampling is combined with all the failed disk samples to obtain several subsets.By training these subsets,a number of GBDT sub-classification models with higher prediction accuracy are established.Finally,the weights of each sub-model are adaptively determined through the neighborhood sample label of the test sample,and the final disk failure prediction model is integrated by weighted hard voting.The experimental results on 8 sets of KEEL imbalanced datasets show that the recall of the negative is increased by an average of 9.46%compared with the existing typical imbalanced learning algorithm.At the same time,the advancement of the method in the fault prediction rate is verified on disk public datasets and the disk data of a scheduling system.
关 键 词:磁盘故障预测 不平衡数据集 分层欠采样 Bagging-GBDT 自适应加权
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.244