结合局部敏感哈希和随机游走的异常检测算法  被引量:2

Outlier Detection Algorithm Combining Locality Sensitive Hashing and Random Walks

在线阅读下载全文

作  者:舒敏 刘华文[1] 郑忠龙[1] 徐晓丹[1] SHU Min;LIU Huawen;ZHENG Zhonglong;XU Xiaodan(College of Mathematics,Physics and Information Engineering,Zhejiang Normal University,Jinhua,Zhejiang 321004,China)

机构地区:[1]浙江师范大学数理与信息工程学院,浙江金华321004

出  处:《计算机科学与探索》2018年第12期1950-1960,共11页Journal of Frontiers of Computer Science and Technology

基  金:国家自然科学基金No.61572443;浙江省自然科学基金No.LY14F020019;上海市智能信息处理重点实验室开放课题No.IIPL-2016-001~~

摘  要:异常检测是数据挖掘的主要研究热点问题之一。目前已存在很多异常检测的方法,但是现存的主要异常检测方法在高维数据处理过程中效率较低。为解决此问题,提出了一种高效的异常检测算法。该算法结合局部敏感哈希的性质和图的随机游走来识别异常点。具体而言,通过局部敏感哈希实现对高维数据的高效处理,随后利用数据之间距离获取其相似性,并将其转化为随机游走的转移概率。在此基础上,使用随机游走技术计算数据之间的游走概率,其中正常数据之间的转移概率越来越高,而异常点的概率越来越低,进而根据此性质最终辨别异常数据。实验结果表明,提出的方法能有效检测出数据中的异常,总体上优于其他异常检测算法。Outlier detection is one of the main research hotspots of data mining.There are many outlier detection methods,but the existing main outlier detection methods are inefficient in high dimensional data processing.In order to solve this problem,this paper proposes an efficient outlier detection algorithm.The algorithm combines the properties of locality sensitive Hashing and the random walks of the graph to identify outliers.Specifically,locality sensitive Hashing is used to efficiently process high-dimensional data,then the similarity is obtained by using the distance between data,and it is transformed into a transition probability of random walks.On this basis,the random walks technology is used to calculate the transfer probability between the data,in which the transfer probability between normal data will be higher and higher,and the probability of anomaly will be lower and lower,and then the anomaly data can be finally distinguished according to this property.The experimental results show that the proposed method can effectively detect outliers in the data and is generally better than other outlier detection algorithms.

关 键 词:异常点检测 局部敏感哈希 随机游走 数据挖掘 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象