ODIC-DBSCAN:一种新的簇内孤立点分析算法  被引量:7

ODIC-DBSCAN: A New Analytical Algorithm for Inliers

在线阅读下载全文

作  者:王跃飞[1] 于炯[1,2] 苏国平[3] 钱育蓉[2] 廖彬[4] 刘粟 WANG Yue-Fei;YU Jiong;SU Guo-Ping;QIAN Yu-Rong;LIAO Bin;LIU Su(College of Information Science and Engineering,Xinjiang University,Urumqi 830046;School of Software,Xinjiang University,Urumqi 830008;The Economic and Information Commission of Xinjiang Uyghur Autonomous Region,Urumqi 830000;School of Statistics and Information,Xinjiang Uni-versity of Finance and Economics,Urumqi 830012)

机构地区:[1]新疆大学信息科学与工程学院,乌鲁木齐830046 [2]新疆大学软件学院,乌鲁木齐830008 [3]新疆维吾尔自治区经济和信息化委员会,乌鲁木齐830000 [4]新疆财经大学统计与信息学院,乌鲁木齐830012

出  处:《自动化学报》2019年第11期2107-2127,共21页Acta Automatica Sinica

基  金:国家自然科学基金(61862060,61462079,61562086,61562078)资助~~

摘  要:长期以来,孤立点的检测一直聚焦于簇边缘的离散点,当聚类后簇的数目低于实际数目,或孤立点被伪装在簇内的情况下,簇内孤立点的判定则会更加困难.为判定簇内孤立点,提出一种基于密度聚类DBSCAN (Density based spatial clustering of application with noise)的簇内孤立点检测方法 ODIC-DBSCAN (Outlier detection of inner-cluster based on DBSCAN).首先在建立距离矩阵的基础上,通过半径获取策略得到针对该点集的k个有效半径Radius集合,并据此构造密度矩阵;然后建立点集覆盖模型,提出了相邻有效半径构造的覆盖多维体能够覆盖点集的思想,并通过拉格朗日乘子法求取最优的覆盖多维体数目之比,输出点比阈值组;最后重建ODIC-DBSCAN的孤立点检测方法,以簇发生融合现象作为算法终止的判定条件.实验通过模拟数据集,公开benchmark与UCI数据集共同验证了ODIC-DBSCAN算法,展示了聚类过程;分析了算法性能;并通过与其他聚类、孤立点判定方法的对比,验证了算法对簇内孤立点的判定效果.Outlier detection has been focused on the discrete points of cluster edges for a long time. When the number of clusters is less than the actual number, or the outliers are disguised within the cluster, the detection of inliers becomes more difficult. Therefore, a new analytical algorithm for inliers ODIC-DBSCAN(Outlier detection of inner-cluster based on DBSCAN), which is based on DBSCAN(Density based spatial clustering of application with noise), is proposed. First,on the basis of establishing the distance matrix, the set of k effective radii for the set of points is obtained through the proposed Radius Obtaining Strategy, and the density matrix is constructed accordingly. Then, the point-set covering model is established and the idea that the covering multidimensional cube with adjacent effective radius can cover the point sets is proposed. The Lagrange multiplier method is used to obtain the optimal ratio of the number of covering multidimensional cubes, and the group of point ratio thresholds is obtained. Finally, the method of outlier detection based on ODIC-DBSCAN is reconstructed, and the fusion phenomenon of the clusters is taken as the terminating condition of the algorithm. The experiment verifies the ODIC-DBSCAN algorithm through three kinds of point sets: the synthetic point sets, the public clustering benchmarks and the UCI real-world datasets;the clustering process is demonstrated and the performances are analyzed. Besides, experimental results show that comparing to other clustering and outlier detection methods, the ODIC-DBSCAN algorithm is able to determine the inliers more effectively.

关 键 词:聚类 DBSCAN 簇内孤立点 密度关联 孤立点检测 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象