一种改进隔离森林的快速离群点检测算法  被引量:8

Fast Outlier Detection Algorithm Based on Isolation Forest

在线阅读下载全文

作  者:冯嘉琛 蔡江辉[1] 杨海峰[1] FENG Jia-chen;CAI Jiang-hui;YANG Hai-feng(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)

机构地区:[1]太原科技大学计算机科学与技术学院

出  处:《小型微型计算机系统》2019年第11期2418-2423,共6页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(U1731126)资助;山西省重点研发计划项目(201803D121059)资助

摘  要:隔离森林(Isolation Forest)是一种相对高效的离群点检测算法,但在隔离树构建过程中存在的随机性较大,可能影响算法性能.针对以上问题,本文提出了一种基于隔离森林的快速离群点检测算法.该算法首先通过启发式方法选择隔离树样本,即引入判断条件来确定是否构建隔离树;然后,在建树过程中选取特定的切割点把数据插入到相应的叶子节点,以减少随机选择对算法性能的影响;最后,将若干隔离树组成隔离森林,计算被隔离出的每个叶子节点的离群程度s,选取若干个离群程度较大的数据对象作为最终的离群数据.采用UCI数据集对提出的算法进行了验证,结果显示该算法能够在确保检测精度的前提下有效提高离群检测的效率.Isolation Forest is a relatively efficient outlier detection algorithm,but the randomness existing in the process of constructing the isolation tree is large,which may affect the performance of the algorithm. Aiming at the above problem,a fast outlier detection algorithm based on isolation forest is proposed in this paper. The algorithm selects the isolation tree sample by heuristic method,that is,a judgment condition is introduced to determine whether to construct an isolation tree. Then,specific cutting points are selected to insert data into the corresponding leaf node to reduce the influence of random selection on the performance. In the end,several isolation trees are formed into an isolation forest,and the outlier score s of each isolated leaf node is calculated. Some data objects with the largest outlier score are selected as outliers. The proposed algorithm is validated by UCI data set. The results shows that the algorithm can effectively improve the efficiency of outlier detection under the premise of ensuring detection accuracy.

关 键 词:离群点检测 隔离树 隔离森林 启发式 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象