面向高维数据的PCA-Hubness聚类方法  被引量:1

Clustering High-Dimensional Data Using PCA-Hubness

在线阅读下载全文

作  者:葛亮[1] 郎江涛[1] 唐黄[1] 唐允恒 

机构地区:[1]重庆大学计算机学院,重庆400044

出  处:《现代计算机(中旬刊)》2017年第4期52-55,59,共5页Modern Computer

摘  要:hub聚类算法可以解决传统聚类算法无法处理高维数据的问题。然而,由于它未考虑数据中的冗余和噪声特征,从而降低聚类性能。因此,提出PCA-Hubness聚类方法用于提高高维数据的聚类性能。PCA-Hubness聚类方法利用逆近邻数的偏度和本征维度的相互关系,以偏度的变化率为降维依据,保证在对高维数据降维时不会损失过多的有价值信息,有利于提高聚类效果。此算法在UCI数据集上进行实验,相比hub聚类算法,轮廓系数平均提高15%。The hub-based clustering algorithm can solve high dimensional data problem that traditional clustering algorithm cannot handle. Howev- er, since it does not handle redundancy and noise features in high-dimensional data, the clustering performance is reduced. Therefore, PCA-Hubness clustering method is proposed to solve the clustering problem of high-dimensional data. The PCA-Hubness clustering method utilizes the relationship between skewness of anti-nearest-neighborhood's number and intrinsic dimension. According to the rate of change of the skewness, it is guaranteed that the high dimensional data will not lose too much Information. And it is conducive to im- proving the clustering effect. This algorithm performs experiments on the UCI data set, and the Silhouette Index are increased by an aver- age of 15% compared to hub-based clustering algorithm.

关 键 词:Hub聚类 高维数据 偏度 本征维度 PCA 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象