基于PCA和K-均值聚类的有监督分裂层次聚类方法  被引量:6

PCA and K-means based supervised split hierarchy clustering method

在线阅读下载全文

作  者:浦路平[1] 赵鹏大[1] 胡光道[1] 张振飞[1] 夏庆霖[1] 

机构地区:[1]中国地质大学遥感地质与数学地质所

出  处:《计算机应用研究》2008年第5期1412-1414,共3页Application Research of Computers

基  金:国家自然科学基金资助项目(402721122);广西教育厅资助项目(桂教科研[2004]4号)

摘  要:提出了一种新的基于PCA和K-均值聚类的有监督二叉分裂层次聚类方法PCASHC,用K-均值聚类进行逐次二叉聚簇分裂,选择PCA第一主成分相距最远样本点作为K-均值聚类初始聚簇中心,解决了K-均值聚类初始中心随机选择导致结果不确定的问题,用聚簇样本类别方差作为聚簇样本不纯度控制聚簇分裂水平,避免过拟合,可学习到合适的聚类数目。用四组UCI标准数据集对其进行了10折交叉验证分类误差检验,与另外七种分类器相比说明PCASHC有较高的分类精度。The paper presented a new supervised bin-split hierarchy clustering method, PCASHC ( PCA split supervised hierarchy clustering), The method bin-splited cluster by K-means clustering with initial centers undertaken by the samples of maximum and minimum of first principal component of principal component analysis of the cluster, which solve the problem of uncertain result as a result of the uncertain choice of initial centers. In the method, the variance of the classes of the samples in cluster was chose as measure of impurity of cluster samples class, which controls the slip level of cluster, avoid over-fitting and can find out the proper number of clusters. The method tested with 10-fold cross validation for classifying of 4 UCI datasets. It proves the method has excellent classifying accuracy rate comparing of the error rate of it to other 7 representative classifiers for classifying of same datasets with same test way.

关 键 词:数据挖掘 机器学习 有监督聚类 分裂层次聚类 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象