基于随机森林算法的大数据异常检测模型设计  被引量:9

Design of big data anomaly detection model based on random forest algorithm

在线阅读下载全文

作  者:宋世军[1] 樊敏[2] SONG Shi-jun;FAN Min(School of Transportation and Logistics,Southwest Jiaotong University,Chengdu 610031,China;School of CivilEngineering,Southwest Jiaotong University,Chengdu 610031,China)

机构地区:[1]西南交通大学交通运输与物流学院,成都610031 [2]西南交通大学土木工程学院,成都610031

出  处:《吉林大学学报(工学版)》2023年第9期2659-2665,共7页Journal of Jilin University:Engineering and Technology Edition

基  金:国家自然科学基金重大专项项目(71942006);中铁大桥勘测设计院集团有限公司科研项目(KYL202203-0086)。

摘  要:针对大数据异常检测过程易受边缘数据的干扰,导致大数据异常检测准确率较差的问题,提出了一种基于随机森林算法的大数据异常检测模型。首先,利用改进k-means算法对大数据实行聚类处理,采用主成分分析法提取大数据特征;然后,构建基于随机森林分类器的大数据异常检测模型,将提取的特征输入到模型中,构建决策树,并通过动态更新决策树的权重值提高分类器的分类精度;最后,输出分类结果,完成大数据的异常检测。实验结果表明,本文模型的检测时间约为25 s,大数据异常检测准确率平均值为91%,误报率为4.5%。Aiming at the problem that Big data anomaly detection process is easily interfered by edge data,which leads to poor accuracy of Big data anomaly detection,a big data anomaly detection model based on Random forest algorithm was proposed.Firstly,the improved k-means algorithm was used to cluster the big data,and the principal component analysis method was used to extract the features of the big data.Then a big data anomaly detection model based on random forest classifier was built,the extracted features was inputted into the model,a decision tree was built,and the classification accuracy of the classifier was improved by dynamically updating the weight value of the decision tree.Finally,the classification results are output to complete the anomaly detection of big data.The experimental results show that the detection time of the proposed model is about 25 s,the average big data anomaly detection accuracy is 91%,and the false alarm rate is 4.5%.

关 键 词:大数据聚类 特征提取 主成分分析法 随机森林分类器 决策树 更新权重 

分 类 号:TM714[电气工程—电力系统及自动化]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象