检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]陕西师范大学计算机科学学院,陕西西安710119
出 处:《陕西师范大学学报(自然科学版)》2015年第6期1-8,共8页Journal of Shaanxi Normal University:Natural Science Edition
基 金:国家自然科学基金(31372250);陕西省科技攻关项目(2013K12-03-24);中央高校基本科研业务费专项资金(GK201503067)
摘 要:用于发现数据集类簇数k的常用内部评价指标DB(Davies Bouldin)和BWP(Between-within Proportion)等需要先确定一个搜索范围kmax,使数据集的类簇数满足k≤kmax,但如何确定kmax尚无理论指导。针对这一问题,提出一个新F统计量Fr,将Fr作为新聚类有效性准则,以判断聚类算法收敛与否,自适应地确定数据集类簇数;将Fr应用于快速K-medoids算法的收敛性判断,并以基于最小生成树的测地距离,即样本对在最小生成树上的路径长度,代替其间的直接欧氏距离度量样本相似性,得到一种自适应的快速K-medoids聚类算法,解决了K-medoids算法需要人为给定类簇数和不能发现任意形状簇的问题。UCI机器学习数据库数据集和人工模拟数据集实验测试表明,本文提出的Fr指标是一种有效的聚类算法评价指标,基于该指标和测地距离的K-medoids算法不仅能发现任意形状的簇,还可以自适应地确定数据集的类簇数,且对噪音数据有很好的鲁棒性。The internal evaluation criteria,such as DB and BWP et al,need the upper bound of kmaxto be given in advance when these criteria are used to determine the number of clusters of a dataset,so that the number of clusters of kin a dataset satisfies k≤kmax.However,there is not so far a theory about how to find the upper bound of kmax.Aiming at this problem,a new F-statistics Fris proposed in this paper.Frcan be used as a new criterion to judge whether a clustering algorithm is converged or not,so that the number of clusters of a dataset can be found automatically without the upper bound of kmaxto be given in advance.In addition,Fris applied to the fast K-medoids clustering algorithm as its criterion to judge whether its iterations continues or not,and the geodesic distance between samples on the minimum spanning tree(MST),that is the length of the path between samples on MST,is introduced to instead of the direct Euclidean distance between them to measure their similarities,so that an adaptive fast K-medoids algorithm is obtained.The adaptive K-medoids algorithm avoids the deficiencies of the available K-medoids algorithms which need the number of clusters to be given in advance and cannot detect the clusters with any arbitrary shape.The power of the proposed criterion Frand the power of the adaptive fast K-medoids algorithm are tested in real world datasets from UCI machine learning repository and in the synthetic datasets.All the experimental results demonstrate that our proposed criterion Fris a valid clustering criterion,and the adaptive fast K-medoids algorithm based on the Frand geodesic distance can recognize the clusters with arbitrary shape,and can detect the number of clusters automatically,and is robust to noises as well.
关 键 词:F统计量 内部评价指标 类簇数 K-medoids聚类算法 最小生成树
分 类 号:TP181.1[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249