基于极差的隔离森林离群点检测算法  被引量:1

Outlier Detection Algorithm of Isolated Forest Based on Range

在线阅读下载全文

作  者:刘俊成 董东[1] LIU Juncheng;DONG Dong(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China)

机构地区:[1]河北师范大学计算机与网络空间安全学院,河北石家庄050024

出  处:《软件导刊》2023年第8期93-98,共6页Software Guide

基  金:教育部教育考试院“十四五”规划支撑专项课题(NEEA2021064)。

摘  要:基于随机划分的隔离森林算法并没有考虑子样本中含有离群点的概率大小,针对此问题提出基于极差的隔离森林算法,在随机子采样过程中应用极差筛选样本子集,使样本子集中存在较多离群点的概率较大。同时,在隔离树构建过程中通过子节点与其直接父节点的样本量比重控制树的生长形态,以避免生成性能较差的隔离树。在离群值检测数据库(ODDS)中的7个公开数据集以及KDD CUP 99数据集上与8种离群点检测算法比较结果显示,r-iForest算法的准确率高出其他算法2%~40%,且比iForest算法的时间消耗减少约15%。The random division based isolation forest(iForest)algorithm does not consider the probability of outliers in subsamples.An isolat⁃ed forest algorithm based on range(r-iForest)which utilizes range to select sample subsets in the random sub-sampling process is proposed.The approach results in sample subsets with higher probability of outliers.During the construction of the isolation tree,the topology shape of the tree is controlled by the proportion of the child node and its immediate parent node in terms of sample size,so as to avoid an isolation tree with poor performance is generated.The results of comparison with eight outlier detection algorithms on seven public datasets from ODDS and the KDD CUP 99 datasets show that the accuracy of r-iForest algorithm is 2 to 40 percentage points higher than other algorithms,and the time consumption of r-iForest algorithm is about 15%lower than that of iForest algorithm.

关 键 词:随机子采样 离群点检测 隔离森林算法 极差 

分 类 号:TP319[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象