检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张忠平[1,2,3] 郭鑫 张玉停 张睿博 ZHANG Zhongping;GUO Xin;ZHANG Yuting;ZHANG Ruibo(School of Information Science and Engineering,Yanshan University,Qinhuangdao Hebei 066004,China;Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province(Yanshan University),Qinhuangdao Hebei 066004,China;Key Laboratory of Software Engineering of Hebei Province(Yanshan University),Qinhuangdao Hebei 066004,China;School of International Education,Wuhan University of Technology,Wuhan Hubei 430070,China)
机构地区:[1]燕山大学信息科学与工程学院,河北秦皇岛066004 [2]河北省计算机虚拟技术与系统集成重点实验室(燕山大学),河北秦皇岛066004 [3]河北省软件工程重点实验室(燕山大学),河北秦皇岛066004 [4]武汉理工大学国际教育学院,武汉430070
出 处:《计算机应用》2023年第6期1705-1712,共8页journal of Computer Applications
基 金:国家自然科学基金资助项目(61972334)。
摘 要:使用传统的基于图的方法进行离群点检测构造转移概率矩阵需要使用数据的整体分布,容易忽略数据的局部信息,导致检测精度低,而使用数据的局部信息可能导致“悬空链接”的问题。针对这些问题,提出一个基于全息图平稳分布因子的离群点检测算法(HSDFOD)。首先,使用相似度矩阵自适应地获取每个数据点的邻居集合构造一个局部信息图;然后,引入最小生成树构造一个全局信息图;最后,利用局部信息图和全局信息图融合为一个全息图构造转移概率矩阵进行马尔可夫随机游走,并通过生成的平稳分布检测离群点。在人工数据集A1~A4上,HSDFOD的精确率均高于SOD(Outlier Detection in axis-parallel Subspaces of high dimensional data)、SUOD(accelerating large-Scale Unsupervised heterogeneous Outlier Detection)、IForest(Isolation Forest)和HBOS(Histogram-Based Outlier Score);曲线下面积(AUC)整体上也优于这4个对比算法。在真实数据集上,HSDFOD的精确率均高于80%,AUC均高于SOD、SUOD、IForest和HBOS。可见,所提算法在离群点检测上有较好的应用前景。Constructing the transition probability matrix for outlier detection by using traditional graph-based methods requires the use of the overall distribution of the data,and the local information of the data is easily ignored,resulting in the problem of low detection accuracy,and using the local information of the data may lead to“suspended link”problem.Aiming at these problems,an Outlier Detection algorithm based on Hologram Stationary Distribution Factor(HSDFOD)was proposed.Firstly,a local information graph was constructed by adaptively obtaining the set of neighbors of each data point through the similarity matrix.Then,a global information graph was constructed by the minimum spanning tree.Finally,the local information graph and the global information graph were integrated into a hologram to construct a transition probability matrix for Markov random walk,and the outliers were detected through the generated stationary distribution.On the synthetic datasets A1 to A4,HDFSOD has higher precision than SOD(Outlier Detection in axis-parallel Subspaces of high dimensional data),SUOD(accelerating large-Scale Unsupervised heterogeneous Outlier Detection),IForest(Isolation Forest)and HBOS(Histogram-Based Outlier Score);and AUC(Area Under Curve)also better than the four comparison algorithms generally.On the real datasets,the precision of HSDFOD is higher than 80%,and the AUC of HSDFOD is higher than those of SOD,SUOD,IForest and HBOS.It can be seen that the proposed algorithm has a good application prospect in outlier detection.
关 键 词:离群点 全息图 转移概率矩阵 马尔可夫随机游走 平稳分布因子
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222