检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘亚梅 闫仁武 LIU Yamei;YAN Renwu(School of Computer Science,Jiangsu University of Science and Technology,Zhenjing 212003)
机构地区:[1]江苏科技大学计算机学院
出 处:《计算机与数字工程》2019年第6期1320-1325,共6页Computer & Digital Engineering
摘 要:局部离群点检测算法是数据挖掘中的一个重要研究方向,随着数据的爆炸式增长,挖掘离群点的工作变得更加有意义,当前的各种检测算法在处理大规模数据上存在很多不足。论文将传统的离群点检测算法LOF和Hadoop分布式平台下的MapReduce分布式框架结合,实现了并行化策略,并且通过密度聚类算法DBSCAN对其进行了改进。论文算法和LOF算法、其他改进算法相比在效率和准确率上均有所提高。并且随着Hadoop系统中数据节点个数的增加,算法的运行效率相应的有所提高,实验结果表明论文算法在处理大规模数据上是可行的。Local outlier detection algorithm is an important research direction in data mining,with the explosive growth of data mining,outlier work becomes more meaningful. The current detection algorithms have many disadvantages in dealing with large-scale data. This paper combines the traditional outlier detection algorithm LOF and the MapReduce distributed framework of Hadoop distributed platform,and implements the parallelization strategy,and improves it by density clustering algorithm DBSCAN.Compared with other LOF algorithms and other improved algorithms,the proposed algorithm improves both efficiency and accuracy.Moreover,with the increase of the number of data nodes in the Hadoop system,the efficiency of the algorithm is improved accordingly. The experimental results show that the algorithm is feasible in dealing with large-scale data.
关 键 词:局部离群点检测 密度聚类 Hadoop MAPREDUCE 并行化 局部离群因子
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.31