基于改进的局部异常因子检测的优化聚类算法  被引量:13

Optimal clustering algorithm based on modified local outlier factor detection

在线阅读下载全文

作  者:张丹丹 游子毅[1] 郑建 陈世国[1] ZHANG Dan-dan;YOU Zi-yi;ZHENG Jian;CHEN Shi-guo(School of Physics and Electronic Sciences,Guizhou Normal University,Guiyang 550001,China;Guizhou Rural Credit Cooperative Association,Guiyang 550081,China)

机构地区:[1]贵州师范大学物理与电子科学学院,贵州贵阳550001 [2]贵州省农村信用社联合社,贵州贵阳550081

出  处:《微电子学与计算机》2019年第11期43-48,共6页Microelectronics & Computer

基  金:国家自然科学基金(61462015);贵州师范大学研究生创新基金项目(YC[2018]016);贵州省科技计划项目(黔科合LH字[2016]7223号)

摘  要:聚类分析在无监督学习领域中一直备受国内外学者关注.针对K-means聚类算法对初始聚类中心点敏感、簇内数据相关性差以及收敛到局部最优的缺点,提出了一种基于离群因子的优化聚类算法.该算法采用信息熵加权欧式距离作为相似性度量依据,以更明显地区分数据对象间的差异,然后利用k距离参数自调整的局部异常因子检测算法计算出各数据点的离群因子并筛选出初始聚类中心的候选集,最后根据其离群因子加权距离法优化聚类中心.通过在UCI数据集上的实验测试结果表明,优化算法的准确率比K-means++算法、OFMMK-means算法、FCM算法更高,运行速度比FCM算法更快.该算法能够更好地应用于入侵行为检测、信用风险评估以及多故障诊断等领域.Cluster analysis has been concerned by scholars at home and abroad in the field of unsupervised learning. Aiming at the disadvantages of K-means clustering algorithm for initial clustering center point sensitivity, poor data correlation in clusters and convergence to local optimization, an optimized clustering algorithm based on outlier factor is proposed in this paper. The algorithm firstly takes the information entropy weighted European distance as the basis of similarity measurement, in order to distinguish the difference between the data objects more obviously, then calculates the outlier factor of each data point by using the k distance parameter self-adjusting of the Local Outlier Factor algorithm and selects the candidate set of the initial clustering center, and finally optimizes the clustering center according to the outlier factor weighted distance method. The experimental results on UCI DataSet show that the accuracy of the optimization algorithm is higher than that of k-means++ algorithm, OFMMK-means algorithm and FCM algorithm, and its running speed is faster than the FCM algorithm. The algorithm can be better used in intrusion behavior detection, credit risk assessment and multi-fault diagnosis.

关 键 词:聚类 Kmeans 加权欧式距离 LOF算法 优化 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象