检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:沈郭鑫 蒋中云[2] SHEN Guo-xin;JIANG Zhong-yun(College of Information,Shanghai Ocean University,Shanghai 201306;College of Information,Shanghai Jian Qiao University,Shanghai 201306,China)
机构地区:[1]上海海洋大学信息学院,上海201306 [2]上海建桥学院信息技术学院,上海201306
出 处:《计算机工程与科学》2022年第2期372-380,共9页Computer Engineering & Science
基 金:上海市属高校应用型本科试点专业基金(Z32004-17-84)。
摘 要:针对二分K-均值算法由于随机选取初始中心及人为定义聚类数而造成的聚类结果不稳定问题,提出了基于密度和中心指标的Canopy二分K-均值算法SDC_Bisecting K-Means。首先计算样本中数据密度及其邻域半径;然后选出密度最小的数据并结合Canopy算法的思想进行聚类,将得到的簇的个数及其中心作为二分K-均值算法的输入参数;最后在二分K-均值算法的基础上引入指数函数和中心指标对原始样本进行聚类。利用UCI数据集和自建数据集进行模拟实验对比,结果表明SDC_Bisecting K-Means不仅使得聚类结果更精确,同时算法的运行速度更快、稳定性更好。Aiming at the problem of unstable clustering results caused by the random selection of initial centers and artificially defining the number of clusters in the bisecting K-means algorithm,a Canopy bisecting K-means algorithm based on density and center index is proposed.Firstly,the algorithm calculates the data density in the sample and its neighborhood radius.Secondly,the data with the smallest density are selected and the ideas of the Canopy algorithm is combined for clustering.The number of clusters and cluster centers are obtained as the input parameters of the bisecting K-means algorithm.Finally,based on the bisecting K-means algorithm,the exponential function and central index are introduced to cluster the original samples.UCI data set and self-built data set were used to compare simulation experiments.The results show that the algorithm not only makes the clustering results more accurate and faster,but also has better stability.
关 键 词:聚类 二分K-均值算法 密度 邻域半径 指数函数 中心指标
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222