检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]解放军信息工程大学密码工程学院,郑州450000
出 处:《计算机科学》2016年第7期251-254,274,共5页Computer Science
摘 要:针对现有离群点检测算法在运用于大规模数据集时时间效率较低的问题,提出一种基于K近邻的并行离群点检测算法PODKNN(Parallel Outlier Detection Based on K-nearest Neighborhood)。该算法利用划分策略对数据集进行预处理,在规模较小的子集中寻找K近邻并计算离群度,最后合并结果并遴选出离群点,设计算法过程使其符合MapReduce的编程模型,实现并行化,从而提高了离群点检测算法处理大规模数据的计算效率。实验结果表明,PODKNN具有较高的加速比及较好的扩展性。In order to improve the outlier detection algorithm's efficiency of dealing with large-scale data set, a parallel outlier detection based on K-nearest neighborhood was put forward. This algorithm can find the K-nearest neighborhood and calculate the degrees of outliers by using partitioning strategy for pretreatment of data sets, and then it merges the results and selects outliers. The algorithm is designed to suit for the MapReduce programming model to implement parallelization and improve the computational efficiency of dealing with large-scale data sets. The experimental results show that the PODKNN has the advantages of high speedup and good scalability.
关 键 词:数据挖掘 离群点检测 K近邻 MAPREDUCE
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222