多维数据K-means谱聚类算法改进研究  被引量:2

Research on Modification of K-means Spectral Clustering Algorithm of Multidimensional Data

在线阅读下载全文

作  者:谢志明[1,2] 王鹏 黄焱[4] 

机构地区:[1]汕尾职业技术学院信息工程系,广东汕尾516600 [2]汕尾市创新工业设计研究院云计算与数据中心工程设计研究所,广东汕尾516600 [3]西南民族大学计算机科学与技术学院,四川成都610041 [4]淮阴师范学院计算机科学与技术学院,江苏淮安223300

出  处:《计算机技术与发展》2017年第10期60-64,共5页Computer Technology and Development

基  金:国家自然科学基金资助项目(60702075);广东省科技厅高新技术产业化科技攻关项目(2011B010200007);广东省高等职业教育质量工程教育教学改革项目(GDJG2015244;GDJG2015245)

摘  要:针对传统K-means算法不能自动确定初始聚类数目k和谱聚类算法对参数敏感的问题,提出了一种基于谱聚类的K-means(PK-means)算法。该算法在对k值选取时进行了创新改进,将计算所得的高密度数据点按规律排序,选择密度点前96%的进行聚类,可以以较高的准确率取得聚类数目k,同时采用了不受参数影响且稳定性更高的基于谱聚类模糊的相似性度量方法,利用FCM算法求隶属度矩阵确定数据点间的相似性。应用PK-means算法、K均值算法与密度敏感的谱聚类算法(DSSC)进行了多维非线性数据处理的测试实验。实验结果表明,无论是对于低维数据集还是高维数据集,K-means算法的处理效率是最低的,DSSC算法稍好,而PK-means算法优势明显,其相比传统聚类算法具有更高的聚类精度和更强的鲁棒性,且维数越高,聚类性能表现越突出。Aiming at the problem that the traditional K -means algorithm cannot determine the initial cluster number k automatically and spectral clustering algorithm is sensitive to parameter, a new K -means algorithm based on spectral clustering called PK-means is pro- posed. It makes improvement and innovation in selection of k values, sorts the calculated high density data points orderly, and then picks out the frontal 96% density point to cluster,so that the number of clusters k can be obtained with high accuracy. In the meantime,it also selects the unaffected and higher stable similarity measure method based on spectral clustering fuzziness and uses the FCM algorithm for membership degree matrix so as to determine the similarity between data points. The PK-means, K-means and DSSC have been em- ployed to deal with multi-dimensional nonlinear datasets. The experimental results show that whether the selected data source is low di- mension or high dimension,the efficiency of K -means is the lowest, followed by DSSC, and PK-means owns obvious advantages which always has the higher clustering accuracy and stronger robustness than the traditional clustering algorithm. The higher the dimension, the more prominent the clustering performance.

关 键 词:K-MEANS算法 谱聚类算法 聚类 FCM算法 隶属度矩阵 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象