检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王诚[1] 狄萱 WANG Cheng;DI Xuan(School of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
机构地区:[1]南京邮电大学通信与信息工程学院,江苏南京210003
出 处:《计算机技术与发展》2021年第6期13-18,共6页Computer Technology and Development
基 金:江苏省自然科学基金项目(BK20141428)。
摘 要:异常检测是近年来数据挖掘中热门的研究课题之一,孤立森林算法是一种高效的无监督的异常检测算法,可以很好地处理高维大规模数据。针对孤立森林算法在计算测试样本的异常值时,计算的是测试样本在孤立森林下的平均路径长度,忽略了孤立二叉树间检测异常能力的差异性以及大规模数据下构建大量孤立二叉树需要耗费大量内存时间这两点不足,提出一种并行化改进孤立森林算法。利用每棵孤立二叉树的路径长度标准差对其进行加权计算异常值,并基于Spark平台实现并行化。通过在公开数据集上进行的对比实验及多种参数配置的并行性能对比实验表明,并行化改进孤立森林算法能够提高异常检测的精确度,同时具有很好的并行性能,能够高效处理需要构建大量孤立二叉树的大规模数据集。Anomaly detection is one of the hot research topics in data mining in recent years. Isolation Forest algorithm is an efficient unsupervised anomaly detection algorithm that can handle high-dimensional large-scale data well. When Isolation Forest algorithm calculates the outliers of test samples, it calculates the average path length of test samples in Isolation Forest, ignoring the difference in the ability to detect abnormalities between isolation trees and the large amount of memory and time needed to construct a larger number of isolation trees under large-scale data. For these two deficiencies, an improved parallelized Isolation Forest algorithm is proposed. The standard deviation of the path length of each isolation tree is used to weight the outliers, and the parallelization is implemented based on the Spark platform. The comparison experiments on public datasets and parallel performance comparison experiments with multiple parameter configurations show that the proposed algorithm can improve the accuracy of anomaly detection with excellent parallel performance, and can effectively deal with large-scale data sets that need to build a large number of isolation trees.
关 键 词:异常检测 孤立森林算法 孤立二叉树 SPARK 并行化
分 类 号:TP301.[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28