基于全息图平稳分布因子的离群点检测算法  

Outlier detection algorithm based on hologram stationary distribution factor

在线阅读下载全文

作  者:张忠平[1,2,3] 郭鑫 张玉停 张睿博 ZHANG Zhongping;GUO Xin;ZHANG Yuting;ZHANG Ruibo(School of Information Science and Engineering,Yanshan University,Qinhuangdao Hebei 066004,China;Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province(Yanshan University),Qinhuangdao Hebei 066004,China;Key Laboratory of Software Engineering of Hebei Province(Yanshan University),Qinhuangdao Hebei 066004,China;School of International Education,Wuhan University of Technology,Wuhan Hubei 430070,China)

机构地区:[1]燕山大学信息科学与工程学院,河北秦皇岛066004 [2]河北省计算机虚拟技术与系统集成重点实验室(燕山大学),河北秦皇岛066004 [3]河北省软件工程重点实验室(燕山大学),河北秦皇岛066004 [4]武汉理工大学国际教育学院,武汉430070

出  处:《计算机应用》2023年第6期1705-1712,共8页journal of Computer Applications

基  金:国家自然科学基金资助项目(61972334)。

摘  要:使用传统的基于图的方法进行离群点检测构造转移概率矩阵需要使用数据的整体分布,容易忽略数据的局部信息,导致检测精度低,而使用数据的局部信息可能导致“悬空链接”的问题。针对这些问题,提出一个基于全息图平稳分布因子的离群点检测算法(HSDFOD)。首先,使用相似度矩阵自适应地获取每个数据点的邻居集合构造一个局部信息图;然后,引入最小生成树构造一个全局信息图;最后,利用局部信息图和全局信息图融合为一个全息图构造转移概率矩阵进行马尔可夫随机游走,并通过生成的平稳分布检测离群点。在人工数据集A1~A4上,HSDFOD的精确率均高于SOD(Outlier Detection in axis-parallel Subspaces of high dimensional data)、SUOD(accelerating large-Scale Unsupervised heterogeneous Outlier Detection)、IForest(Isolation Forest)和HBOS(Histogram-Based Outlier Score);曲线下面积(AUC)整体上也优于这4个对比算法。在真实数据集上,HSDFOD的精确率均高于80%,AUC均高于SOD、SUOD、IForest和HBOS。可见,所提算法在离群点检测上有较好的应用前景。Constructing the transition probability matrix for outlier detection by using traditional graph-based methods requires the use of the overall distribution of the data,and the local information of the data is easily ignored,resulting in the problem of low detection accuracy,and using the local information of the data may lead to“suspended link”problem.Aiming at these problems,an Outlier Detection algorithm based on Hologram Stationary Distribution Factor(HSDFOD)was proposed.Firstly,a local information graph was constructed by adaptively obtaining the set of neighbors of each data point through the similarity matrix.Then,a global information graph was constructed by the minimum spanning tree.Finally,the local information graph and the global information graph were integrated into a hologram to construct a transition probability matrix for Markov random walk,and the outliers were detected through the generated stationary distribution.On the synthetic datasets A1 to A4,HDFSOD has higher precision than SOD(Outlier Detection in axis-parallel Subspaces of high dimensional data),SUOD(accelerating large-Scale Unsupervised heterogeneous Outlier Detection),IForest(Isolation Forest)and HBOS(Histogram-Based Outlier Score);and AUC(Area Under Curve)also better than the four comparison algorithms generally.On the real datasets,the precision of HSDFOD is higher than 80%,and the AUC of HSDFOD is higher than those of SOD,SUOD,IForest and HBOS.It can be seen that the proposed algorithm has a good application prospect in outlier detection.

关 键 词:离群点 全息图 转移概率矩阵 马尔可夫随机游走 平稳分布因子 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象