一种利用代表点的有效聚类算法设计与实现  被引量:10

THE DESIGN AND IMPLEMENTATION OF CLUSTERING ALGORITHM USING REPRESENTATIVE DATA

在线阅读下载全文

作  者:陈恩红[1] 王上飞[1] 宁岩[1] 王煦法[1] 

机构地区:[1]中国科学技术大学计算机系,合肥230027

出  处:《模式识别与人工智能》2001年第4期417-422,共6页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金

摘  要:本文针对传统的聚类算法倾向于识别大小类似的球形聚类簇,且对离群数据较为敏感等问题,利用聚类簇代表点选取的方法,设计了一种有效的聚类算法。该方法首先从聚类簇中选取充分分散的若干数据点,然后将它们向聚类簇的重心收缩,依此得到的多个数据点作为聚类簇的代表。通过选取多个代表点,本算法可以捕捉到不同形状的聚类簇的几何特征,且受离群数据的影响较小,实验结果表明,该算法处理复杂数据是有效的。To solve the problems existing in traditional clustering algorithms which are favorable to identify clusters with same size and spherical shape, and are sensitive to outliers, an effective clustering algorithm which using representative data of clusters is designed in this paper. In this approach, some well scattered data points are first selected from each cluster, then these points are shrunk to the center of the cluster. These obtained points are the representatives of clusters. In each hierarchical clustering step of the proposed algorithm, the pair of clusters that have the smallest distance among all pairs of clusters are merged into one cluster. When computing the distance between a pair of clusters, all distances between pairs of representative points, one of which is selected from a cluster and the other is selected from the other cluster, are calculated and then the smallest distance is served as the distance between two clusters. Through selecting multiple representative points in this way, the algorithm can capture the geometry features of clusters with different shapes and sizes, and is not very sensitive to the outliers. The experimental results demonstrate the effectiveness of the algorithm on clustering complex data.

关 键 词:层次式聚类 代表点 有效聚类算法 数据挖掘 模式识别 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象