检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:宋世军[1] 樊敏[2] SONG Shi-jun;FAN Min(School of Transportation and Logistics,Southwest Jiaotong University,Chengdu 610031,China;School of CivilEngineering,Southwest Jiaotong University,Chengdu 610031,China)
机构地区:[1]西南交通大学交通运输与物流学院,成都610031 [2]西南交通大学土木工程学院,成都610031
出 处:《吉林大学学报(工学版)》2023年第9期2659-2665,共7页Journal of Jilin University:Engineering and Technology Edition
基 金:国家自然科学基金重大专项项目(71942006);中铁大桥勘测设计院集团有限公司科研项目(KYL202203-0086)。
摘 要:针对大数据异常检测过程易受边缘数据的干扰,导致大数据异常检测准确率较差的问题,提出了一种基于随机森林算法的大数据异常检测模型。首先,利用改进k-means算法对大数据实行聚类处理,采用主成分分析法提取大数据特征;然后,构建基于随机森林分类器的大数据异常检测模型,将提取的特征输入到模型中,构建决策树,并通过动态更新决策树的权重值提高分类器的分类精度;最后,输出分类结果,完成大数据的异常检测。实验结果表明,本文模型的检测时间约为25 s,大数据异常检测准确率平均值为91%,误报率为4.5%。Aiming at the problem that Big data anomaly detection process is easily interfered by edge data,which leads to poor accuracy of Big data anomaly detection,a big data anomaly detection model based on Random forest algorithm was proposed.Firstly,the improved k-means algorithm was used to cluster the big data,and the principal component analysis method was used to extract the features of the big data.Then a big data anomaly detection model based on random forest classifier was built,the extracted features was inputted into the model,a decision tree was built,and the classification accuracy of the classifier was improved by dynamically updating the weight value of the decision tree.Finally,the classification results are output to complete the anomaly detection of big data.The experimental results show that the detection time of the proposed model is about 25 s,the average big data anomaly detection accuracy is 91%,and the false alarm rate is 4.5%.
关 键 词:大数据聚类 特征提取 主成分分析法 随机森林分类器 决策树 更新权重
分 类 号:TM714[电气工程—电力系统及自动化]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.106