融合孤立森林和局部离群因子的离群点检测方法  被引量:6

OUTLIER DETECTION METHOD BASED ON ISOLATION FOREST AND LOF

在线阅读下载全文

作  者:凌莉 程张玉 邹承明[2,3] Ling Li;Cheng Zhangyu;Zou Chengming(School of Information Engineering,Wuhan Institute of Technology,Wuhan 431400,Hubei,China;School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,Hubei,China;Hubei Key Laboratory of Transportation Internet of Things,Wuhan 430070,Hubei,China)

机构地区:[1]武汉工程职业技术学院信息工程学院,湖北武汉431400 [2]武汉理工大学计算机科学与技术学院,湖北武汉430070 [3]交通物联网技术湖北省重点实验室,湖北武汉430070

出  处:《计算机应用与软件》2022年第12期278-283,共6页Computer Applications and Software

基  金:国家重点研发计划项目(2018YFC0704303)。

摘  要:单一的离群点检测方法对所有数据采用同一种异常标准,无法综合考虑全局和局部信息,存在精度不足和效率低下等问题。为解决上述问题,提出一种融合孤立森林(iForest)和局部离群因子(LOF)的离群点检测方法(FSIF-HDLOF),即利用高效的iForest对原始数据集进行剪枝,再采用LOF对剪枝后的数据集进行更精确的检测。在剪枝及检测阶段,算法针对iForest和LOF的不足进行相应改进。结合数据点在剪枝及检测阶段的异常信息,定义加权融合公式来确定离群点。实验结果表明,FSIF-HDLOF实现了检测精度与效率的良好平衡,尤其在大数据量且低离群点比例的数据集上的检测精度优势较大。A single outlier detection method applies the same anomaly standard for all data, which cannot comprehensively consider the global and local information, and has problems such as insufficient accuracy and low efficiency. In order to solve the above problems, we propose an outlier detection method(FSIF-HDLOF) that combined isolated forest(iForest) and local outlier factor(LOF). It used efficient iForest to prune the original dataset, and then used LOF to perform more accurate detection on the pruned dataset. In the pruning and detection phases, the algorithm improved correspondingly to the deficiency of iForest and LOF. Combining the abnormal information of the data points in the pruning and the detection phases, a weighted fusion formula was defined to determine the outliers. The experimental results show that FSIF-HDLOF can achieve a good balance between detection accuracy and efficiency, especially in outlier detection on datasets with large data volume and low outlier ratio.

关 键 词:离群点检测 大规模多维数据 孤立森林 数据降维 局部离群因子 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象