检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]沈阳工业大学,沈阳110870
出 处:《小型微型计算机系统》2018年第1期74-77,共4页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61074005)资助
摘 要:针对现有的混合属性离群点检测算法大多检测质量不高等问题,本文提出了改进的DBSCAN聚类和新的局部离群因子LAOF两阶段混合数据的离群点检测算法.针对DBSCAN算法中参数ε和Minpts需要人为确定而导致聚类质量差的缺点,给出了通过输入K近邻的个数代替Minpts并通过K近邻确定聚类半径,从而减少参数输入提高聚类质量.通过改进的DBSCAN聚类算法对混合数据进行初步筛选,然后利用新构造的LAOF基于区域密度的局部异常因子计算筛选后数据对象的局部异常程度.在混合数据进行距离度量的过程中采用除一化信息熵差值确定属性权重,并在第二阶段进行二次权重确定.最后利用真实数据对提出的算法进行了验证,结果显示该算法能够提高离群点检测的精度.Aiming at the accuracy of most existing outlier detection algorithms for mixed data is not high enough as desired. To solvethe problem, an two stage outlier detection algorithm is proposed for mixed data based on improved DBSCAN clustering and new localoutlier factor LAOF. however in theDBSCANalgorithm,the parameters of eand Minptsneed to be determined artificially, which leads tothe poor accuracy. In this paper we input the.number of K nearest neighbor substituted forMinptsand the cluster radius is determined bythe K nearest neighbor, which reduces the parameter input and improves the clustering quality. First carrying on the preliminary screen-ing for mixed data by improved DBSCANclustering algorithm, Then the local anomaly of the mixed data set is calculated by the localoutlier factor based on the area density (LAOF). In the process of distance measure for mixed data,the attribute weights are deter-mined by the difference of the information entropy, we made it twice to determine the weight of the data in the further testing. At last,the proposed algorithm is verified by the actual data and the results showed that the algorithm can improve the accuracy of outlier de-tection.
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.142.244.250