基于UPGMA的优化初始中心K-means算法研究  被引量:2

Research on K-means Algorithm for Optimizing Initial Center Based on UPGMA

在线阅读下载全文

作  者:张锐 王义武 朱啸龙 殷俊 韩晨 杨余旺 

机构地区:[1]南京理工大学计算机科学与工程学院,江苏南京210000

出  处:《计算机技术与发展》2018年第2期50-53,58,共5页Computer Technology and Development

基  金:国家自然科学基金(61640020);江苏省科技支撑计划(BE2012386;BE2011342);江苏省农业自主创新项目(CX(13)3054;CX(16)1006);江苏省重点研发计划(BE2016368-1)

摘  要:为了弥补传统K-means算法聚类效果严重依赖于初始聚类中心这一不足,提出了OICC K-means算法。将不加权算术平均组对法(UPGMA)进行改进,通过该算法将密集区域的数据合并得到可以反映数据分布的若干数据点,再由最大最小距离算法从中选出彼此相距较远的点,作为传统K-means算法的初始聚类中心,从而使K-means算法有一个可以反映数据分布特征的输入。在典型数据集上进行的实验发现,相较于传统K-means算法,OICC K-means算法拥有更强的聚类能力,在准确率、召回率和F-测量值方面均有明显提高。在OICC K-means算法的前两个阶段(即UPGMA算法和最大最小距离算法)产生了较理想的初始聚类中心,这些中心点选自于数据密集的区域,因此避免了噪声数据、边缘数据带来的不良影响,使得K-means算法没有陷入局部最优解而达到了整体良好的聚类效果,同时聚类中心的个数在算法中自动确定而不需要手动设置。In order to compensate for deficiency that the traditional K -means algorithm depends heavily on initial clustering centers in clustering effect,we propose the OICC K -means algorithm. By improved UPGMA,the data in dense area is combined to obtain a number of datapoints that can reflect the distribution of the data,from which the distant one from each other is chosen by the maximum and minimum distance algorithm as the initial clustering center of traditional K -means algorithm,so that it has an input that reflects the characteristics of thedata distribution. It can be found in experiment on the typical data set that OICC K -means algorithm,with a stronger clustering,comparedwith the traditional K -means algorithm,is improved in accuracy,recall and F-measure obviously. The first two stages of the OICC K -meansalgorithm (the UPGMA and the maximum and minimum distance) produces ideal initial clustering centers which are selected from the data-intensive regions,thus avoiding the adverse effects caused by noise data and edge data. Therefore,the K -means algorithm does not fall intothe local optimal solution and achieves the overall good clustering effect,and the number of clustering centers is automatically determinedwithout manual setting.

关 键 词:聚类 初始中心 不加权算术平均组对法 最大最小距离算法 K-MEANS算法 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象