检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邹臣嵩[1] 段桂芹[2] ZOU Chen-Song;DUAN Gui-Qin(Department of Electrical Engineering,Guangdong Songshan Polytechnic,Shaoguan 512126,China;Department of Computer Science,Guangdong Songshan Polytechnic,Shaoguan 512126,China)
机构地区:[1]广东松山职业技术学院电气工程系,韶关512126 [2]广东松山职业技术学院计算机系,韶关512126
出 处:《计算机系统应用》2019年第6期235-242,共8页Computer Systems & Applications
基 金:韶关市科技计划项目(2017CX/K055);广东松山职业技术学院重点科技项目(2018KJZD001)~~
摘 要:为了更好地评价无监督聚类算法的聚类质量,解决因簇中心重叠而导致的聚类评价结果失效等问题,对常用聚类评价指标进行了分析,提出一个新的内部评价指标,将簇间邻近边界点的最小距离平方和与簇内样本个数的乘积作为整个样本集的分离度,平衡了簇间分离度与簇内紧致度的关系;提出一种新的密度计算方法,将样本集与各样本的平均距离比值较大的对象作为高密度点,使用最大乘积法选取相对分散且具有较高密度的数据对象作为初始聚类中心,增强了K-medoids算法初始中心点的代表性和算法的稳定性,在此基础上,结合新提出的内部评价指标设计了聚类质量评价模型,在UCI和KDDCUP99数据集上的实验结果表明,新模型能够对无先验知识样本进行有效聚类和合理评价,能够给出最优聚类数目或最优聚类范围.In order to better evaluate the clustering quality of unsupervised clustering algorithm and solve the problem of invalidation of clustering evaluation results caused by overlapping cluster centers, the commonly used cluster evaluation index is analyzed and a new internal evaluation index is proposed, the product of the minimum square of the distance between the adjacent boundary points and the number of samples in the cluster is taken as the separation degree of the whole sample set, the relation between the degree of separation between clusters and the degree of compactness within clusters is balanced;a new density calculation method is proposed, which takes the object with a larger average distance ratio between the sample set and each sample as a high-density point, and uses the maximum product method to select the relatively dispersed data object with a higher density as the initial cluster center, thus enhancing the representativeness of the initial center of K-medoids algorithm and the stability of the algorithm. On this basis, the cluster quality evaluation model is designed with the newly proposed internal evaluation index. The experimental results on UCI and KDD CUP 99 data sets show that the new model can effectively cluster and reasonably evaluate non-prior knowledge samples, and can give the optimal number or range of clustering.
关 键 词:聚类评价指标 K-medoids 无监督聚类 最优聚类数
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30