基于剪枝的海量数据离群点挖掘被引量：6

Pruning-based Outlier Mining from Large Dataset

出　　处：《计算机科学》2012年第10期152-156,共5页Computer Science

摘　　要：基于距离的离群点挖掘通常需要O(N2)的时间进行大量的距离计算与比较,这限制了其在海量数据上的应用。针对此问题,提出了一个带剪枝功能的离群点挖掘算法。算法分为两步:在对数据集进行一遍扫描后,剪枝掉大量的非离群点;然后对余下的可疑数据实施一种改进的嵌套循环算法,以每个数据点与其k个最近邻点的平均距离作为离群度,确定前n个离群点。在真实数据和合成数据集上的实验结果均表明,该算法在获得高命中率的同时仍保持低误警率。与相关算法相比,其具有较低的时间复杂性。Distance-based outlier detection approach typically requires O（N2） time of distance computation and compari-son.This quadratic scaling restricts the ability to apply this approach to large datasets.To overcome this limitation,a novel distance-based outlier mining approach with pruning rules was proposed.The approach consists of two phases.During the first phase,the original input data are scanned and the majority of non-outliers are pruned.During second phase,an improved nested loops approach is applied to compute the average K-nearest distance which measures the degree of being an outlier and finally reports the top-n outliers.Experiments on both synthetic data and real-life data show that the proposed approach achieves a high hit rate with a low false alarm rate.Compared with related approaches,the proposed approach has a lower time complexity.

关键词：离群点数据挖掘基于距离

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于剪枝的海量数据离群点挖掘被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于剪枝的海量数据离群点挖掘 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于剪枝的海量数据离群点挖掘被引量：6