改进的DBSCAN聚类和LAOF两阶段混合数据离群点检测方法  被引量:15

Two-stage Outlier Detection Method Based on DBSCAN Clustering and LAOF of Hybrid Data

在线阅读下载全文

作  者:石鸿雁[1] 马晓娟 

机构地区:[1]沈阳工业大学,沈阳110870

出  处:《小型微型计算机系统》2018年第1期74-77,共4页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61074005)资助

摘  要:针对现有的混合属性离群点检测算法大多检测质量不高等问题,本文提出了改进的DBSCAN聚类和新的局部离群因子LAOF两阶段混合数据的离群点检测算法.针对DBSCAN算法中参数ε和Minpts需要人为确定而导致聚类质量差的缺点,给出了通过输入K近邻的个数代替Minpts并通过K近邻确定聚类半径,从而减少参数输入提高聚类质量.通过改进的DBSCAN聚类算法对混合数据进行初步筛选,然后利用新构造的LAOF基于区域密度的局部异常因子计算筛选后数据对象的局部异常程度.在混合数据进行距离度量的过程中采用除一化信息熵差值确定属性权重,并在第二阶段进行二次权重确定.最后利用真实数据对提出的算法进行了验证,结果显示该算法能够提高离群点检测的精度.Aiming at the accuracy of most existing outlier detection algorithms for mixed data is not high enough as desired. To solvethe problem, an two stage outlier detection algorithm is proposed for mixed data based on improved DBSCAN clustering and new localoutlier factor LAOF. however in theDBSCANalgorithm,the parameters of eand Minptsneed to be determined artificially, which leads tothe poor accuracy. In this paper we input the.number of K nearest neighbor substituted forMinptsand the cluster radius is determined bythe K nearest neighbor, which reduces the parameter input and improves the clustering quality. First carrying on the preliminary screen-ing for mixed data by improved DBSCANclustering algorithm, Then the local anomaly of the mixed data set is calculated by the localoutlier factor based on the area density (LAOF). In the process of distance measure for mixed data,the attribute weights are deter-mined by the difference of the information entropy, we made it twice to determine the weight of the data in the further testing. At last,the proposed algorithm is verified by the actual data and the results showed that the algorithm can improve the accuracy of outlier de-tection.

关 键 词:数据挖掘 离群点检测 信息熵 聚类 加权距离 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象