基于多球分裂的增量式k-means聚类算法  被引量:4

Incremental k-means clustering algorithm based on multi-sphere splitting

在线阅读下载全文

作  者:曲福恒[1] 钱超越 杨勇[1,2] 陆洋 宋剑飞 胡雅婷 QU Fu-heng;QIAN Chao-yue;YANG Yong;LU Yang;SONG Jian-fei;HU Ya-ting(College of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China;College of Education,Changchun Normal University,Changchun 130032,China;College of Information Technology,Jilin Agricultural University,Changchun 130118,China)

机构地区:[1]长春理工大学计算机科学技术学院,长春130022 [2]长春师范大学教育学院,长春130032 [3]吉林农业大学信息技术学院,长春130118

出  处:《吉林大学学报(工学版)》2022年第6期1434-1441,共8页Journal of Jilin University:Engineering and Technology Edition

基  金:吉林省教育厅“十三五”科学技术项目(JJKH20181164KJ);吉林省教育厅科学技术研究项目(JJKH20220330KJ);国家自然科学基金项目(41671397)。

摘  要:针对Ball k-means(BKM)算法对初始化中心敏感、求解精度不高的问题,提出了一种基于多球分裂的增量式k-means聚类算法。该算法利用BKM需要记录各球聚类半径的特点,从给定的初始聚类中心个数开始,按照固定步长一次性产生多个增量聚类中心,直至得到k个初始中心。本文算法避免了所有n个s维数据之间的距离计算,在不增加空间复杂度的前提下将传统增量聚类中心选取的计算量由O(n^(2)s)降低至O(k log k)(k<n)。在8组UCI数据上与k-means、BKM、IK-+以及3个典型增量聚类算法的对比实验结果表明,本文算法降低了BKM的初始化敏感性,解的精度提升了37.15%~66.92%,是对比增量算法中唯一能够提升k-means效率的算法,且随着聚类个数的增加,效率优势更加明显。To address the problem that the Ball k-means(BKM)algorithm is sensitive to the initialized centers and the solution accuracy is not high,an incremental k-means clustering algorithm based on multiball splitting is proposed.The algorithm takes advantage of the feature that BKM needs to record the clustering radius of each ball,starts from a given number of initial clustering centers,and generates multiple incremental clustering centers at once according to a fixed step until k initial centers are obtained.The algorithm avoids the calculation of distances between all n s-dimensional data and reduces the computation of traditional incremental clustering center selection from O(n^(2)s)to O(k log k)(k<n)without increasing the space complexity.The experimental results of comparison with k-means,BKM,IK-+and three typical incremental clustering algorithms on 8 UCI datasets show that the proposed algorithm reduces the initialization sensitivity of BKM and improves the solution accuracy by 37.15%~66.92%;it is the only algorithm among the comparative incremental algorithms that can improve the efficiency of k-means,and the efficiency advantage becomes more obvious as the number of clusters increases.

关 键 词:模式识别 K-MEANS 增量聚类 ball k-means 多球分裂 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象