检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南京理工大学计算机科学与工程学院,江苏南京210000
出 处:《计算机技术与发展》2018年第2期50-53,58,共5页Computer Technology and Development
基 金:国家自然科学基金(61640020);江苏省科技支撑计划(BE2012386;BE2011342);江苏省农业自主创新项目(CX(13)3054;CX(16)1006);江苏省重点研发计划(BE2016368-1)
摘 要:为了弥补传统K-means算法聚类效果严重依赖于初始聚类中心这一不足,提出了OICC K-means算法。将不加权算术平均组对法(UPGMA)进行改进,通过该算法将密集区域的数据合并得到可以反映数据分布的若干数据点,再由最大最小距离算法从中选出彼此相距较远的点,作为传统K-means算法的初始聚类中心,从而使K-means算法有一个可以反映数据分布特征的输入。在典型数据集上进行的实验发现,相较于传统K-means算法,OICC K-means算法拥有更强的聚类能力,在准确率、召回率和F-测量值方面均有明显提高。在OICC K-means算法的前两个阶段(即UPGMA算法和最大最小距离算法)产生了较理想的初始聚类中心,这些中心点选自于数据密集的区域,因此避免了噪声数据、边缘数据带来的不良影响,使得K-means算法没有陷入局部最优解而达到了整体良好的聚类效果,同时聚类中心的个数在算法中自动确定而不需要手动设置。In order to compensate for deficiency that the traditional K -means algorithm depends heavily on initial clustering centers in clustering effect,we propose the OICC K -means algorithm. By improved UPGMA,the data in dense area is combined to obtain a number of datapoints that can reflect the distribution of the data,from which the distant one from each other is chosen by the maximum and minimum distance algorithm as the initial clustering center of traditional K -means algorithm,so that it has an input that reflects the characteristics of thedata distribution. It can be found in experiment on the typical data set that OICC K -means algorithm,with a stronger clustering,comparedwith the traditional K -means algorithm,is improved in accuracy,recall and F-measure obviously. The first two stages of the OICC K -meansalgorithm (the UPGMA and the maximum and minimum distance) produces ideal initial clustering centers which are selected from the data-intensive regions,thus avoiding the adverse effects caused by noise data and edge data. Therefore,the K -means algorithm does not fall intothe local optimal solution and achieves the overall good clustering effect,and the number of clustering centers is automatically determinedwithout manual setting.
关 键 词:聚类 初始中心 不加权算术平均组对法 最大最小距离算法 K-MEANS算法
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249