检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王习特 朱宗梅 于雪苹 白梅 WANG Xite;ZHU Zongmei;YU Xueping;BAI Mei(College of Information Science and Technology,Dalian Maritime University,Dalian 116000,China)
机构地区:[1]大连海事大学信息科学技术学院,辽宁大连116000
出 处:《湖南大学学报(自然科学版)》2020年第10期100-110,共11页Journal of Hunan University:Natural Sciences
基 金:国家自然科学基金资助项目(61602076,61702072,61976032);辽宁省自然科学基金资助项目(20180540003);中国博士后科学基金面上项目(2017M611211,2017M621122);国家重点研发计划项目课题(2017YFC1404606);中央高校基本科研业务费专项资金(3132019202)。
摘 要:离群点检测是数据挖掘领域研究的热点之一,主要目的是识别出数据集中异常但有价值的数据点.随着数据规模不断扩大,使得处理海量数据的效率降低,随即引入分布式算法.目前现有的分布式算法大都用于解决同构分布式的处理环境,但在实际应用中,由于参与分布式计算的处理机配置的差异,现有的分布式离群点检测算法不能很好地适用于异构分布式环境.针对上述问题,本文提出一种面向异构分布式环境的离群点检测算法.首先提出基于网格的动态数据划分方法(Gird-based Dynamic Data Partitioning,GDDP),充分利用各处理机的计算资源,同时根据数据点的空间位置信息进行数据划分,可有效减少网络通信.其次基于GDDP算法,提出了异构分布式环境中并行的离群点检测算法(GDDP-based Outlier Detection Algorithm,GODA).该算法包括2个阶段:在每个处理机本地,按照索引中数据点的顺序进行过滤,通过2次扫描得到离群点候选集;判断候选离群点需要进行网络通信的处理机,使用较低网络开销得出全局离群点.最后,通过大量实验验证了本文提出的GDDP和GODA算法的有效性.Outlier detection is one of the hotspots in the field of data mining.The main purpose is to identify the abnormal but valuable data points in the data set.With the expansion of data scale,the efficiency of processing massive data is reduced,and then a distributed algorithm is introduced.At present,most of the existing distributed algorithms are used to solve the homogeneous distributed processing environment.However,in practical applications,due to the differences in processor configuration involved in distributed computing,the existing distributed outlier detection algorithms cannot be well applied to heterogeneous distributed environments.In view of the above problems,this paper proposes an outlier detection algorithm for heterogeneous distributed environments.Firstly,a grid-based dynamic data partitioning(GDDP)method is proposed,which makes full use of the computing resources of each processor and divides the data according to the spatial location information of the data points,which can effectively reduce the network communication.Secondly,based on the GDDP algorithm,a parallel GDDP-based Outlier Detection Algorithm(GODA)is proposed.The algorithm consists of two stages:in each processor,filtering according to the order of the data points in the index and obtaining the outlier candidate set by two scans;determining the candidate outliers requiring network communication and using low network overhead leads to global outliers.Finally,the effectiveness of the proposed GDDP and GODA algorithms is verified by a large number of experiments.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.130