PODKNN:面向大数据集的并行离群点检测算法  被引量:7

PODKNN:A Parallel Outlier Detection Algorithm for Large Dataset

在线阅读下载全文

作  者:苟杰[1] 马自堂[1] 张喆程 

机构地区:[1]解放军信息工程大学密码工程学院,郑州450000

出  处:《计算机科学》2016年第7期251-254,274,共5页Computer Science

摘  要:针对现有离群点检测算法在运用于大规模数据集时时间效率较低的问题,提出一种基于K近邻的并行离群点检测算法PODKNN(Parallel Outlier Detection Based on K-nearest Neighborhood)。该算法利用划分策略对数据集进行预处理,在规模较小的子集中寻找K近邻并计算离群度,最后合并结果并遴选出离群点,设计算法过程使其符合MapReduce的编程模型,实现并行化,从而提高了离群点检测算法处理大规模数据的计算效率。实验结果表明,PODKNN具有较高的加速比及较好的扩展性。In order to improve the outlier detection algorithm's efficiency of dealing with large-scale data set, a parallel outlier detection based on K-nearest neighborhood was put forward. This algorithm can find the K-nearest neighborhood and calculate the degrees of outliers by using partitioning strategy for pretreatment of data sets, and then it merges the results and selects outliers. The algorithm is designed to suit for the MapReduce programming model to implement parallelization and improve the computational efficiency of dealing with large-scale data sets. The experimental results show that the PODKNN has the advantages of high speedup and good scalability.

关 键 词:数据挖掘 离群点检测 K近邻 MAPREDUCE 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象