基于MapReduce的混合数据孤立点检测算法被引量：3

An Outlier Detection Algorithm for Mixed Data Based on MapReduce

机构地区：[1]山西大学计算机与信息技术学院,太原030006 [2]计算智能与中文信息处理教育部重点实验室,太原030006

出　　处：《小型微型计算机系统》2014年第9期1961-1966,共6页Journal of Chinese Computer Systems

基　　金：国家自然科学基金项目(71031006)资助;山西省科技基础条件平台建设项目(2012091002-0101)资助;山西省回国留学人员科研项目(2013-101)资助

摘　　要：在处理混合型大数据时,已有孤立点检测算法往往存在时间代价大、适用性差等问题.为了解决这一问题,本文基于最近邻思想提出了一个混合数据孤立点检测算法.该算法依据邻域计数的思想给出混合数据对象之间的相异性度量,并基于最近邻定义了对象的孤立度,进而设计出一个混合数据孤立点检测算法,并且基于MapReduce编程模型对该算法进行了并行化设计以进一步提高算法执行效率.最后,在UCI数据集上通过与已有算法比较实验结果表明,本文提出的混合数据孤立点检测算法能有效地检测出孤立点,具有参数少、检测精度高的优点;算法的并行化实现提高了算法对混合型大数据的孤立点检测效率.When detect outliers in current massive mixed datasets, most existing outlier detection algorithms are not very effective and time-consuming. To overcome this deficiency, an outlier detection algorithm is proposed for mixed data based on nearest neighbors. This algorithm firstly defines the dissimilarity measure for mixed data in the light of neighborhood counting. Then, the definition of outlier factor is given. Outliers are those points having the largest values of outlier factor. To further improve the efficiency of the algorithm, a parallel outlier detection algorithm is designed based on MapReduce. The performance of the algorithm has been studied on several real world datasets. The comparisons with other outlier detection algorithms show that the proposed algorithm is more effective in detecting outliers with the merits of few parameters and high precision. And the experiment results of parallel algorithm show that it has high efficiency and scalability for massive mixed datasets.

关键词：孤立点检测混合型数据邻域计数 MAPREDUCE

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于MapReduce的混合数据孤立点检测算法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于MapReduce的混合数据孤立点检测算法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于MapReduce的混合数据孤立点检测算法被引量：3