An unsupervised grid-based approach for clustering analysis 被引量：7

An unsupervised grid-based approach for clustering analysis

作　　者：YUE ShiHong WANG JeenShing TAO Gao WANG HuaXiang

机构地区：[1]School of Electrical Engineering and Automation, Tianjin Key Laboratory of Process Measurement and Control, Tianjin University, Tianjin 300072, China [2]Department of Electrical Engineering, National Cheng Kung University, Tainan 701, China

出　　处：《Science China(Information Sciences)》2010年第7期1345-1357,共13页中国科学（信息科学）（英文版）

基　　金：supported by the National Natural Science Foundation of China (Grant Nos. 60772080, 60572065,50974095);the Tianjin Science Foundation (Grant No. 08JCYBJC13800)

摘　　要：In recent years, the growing volume of data in numerous clustering tasks has greatly boosted the existing clustering algorithms in dealing with very large datasets. The K-means has been one of the most popular clustering algorithms because of its simplicity and easiness in application, but its efficiency and effectiveness for large datasets are often unacceptable. In contrast to the K-means algorithm, most existing grid-clustering algorithms have linear time and space complexities and thus can perform well for large datasets. In this paper, we propose a gridbased partitional algorithm to overcome the drawbacks of the K-means clustering algorithm. This new algorithm is based on two major concepts： 1） maximizing the average density of a group of grids instead of minimizing the minimal square error which is applied in the K-means algorithm, and 2） using grid- clustering algorithms to thoroughly reformulate the object-driven assigning in the K-means algorithm into a new grid-driven assigning. Consequently, our proposed algorithm obtains an average speed-up about 10-100 times faster and produces better partitions than those by the K-means algorithm. Also, compared with the K-means algorithm, our proposed algorithm has ability to partition any dataset when the number of clusters is unknown. The effectiveness of our proposed algorithm has been demonstrated through successfully clustering datasets with different features in comparison with the other three typical clustering algorithms besides the K-means algorithm.In recent years, the growing volume of data in numerous clustering tasks has greatly boosted the existing clustering algorithms in dealing with very large datasets. The K-means has been one of the most popular clustering algorithms because of its simplicity and easiness in application, but its efficiency and effectiveness for large datasets are often unacceptable. In contrast to the K-means algorithm, most existing grid-clustering algorithms have linear time and space complexities and thus can perform well for large datasets. In this paper, we propose a gridbased partitional algorithm to overcome the drawbacks of the K-means clustering algorithm. This new algorithm is based on two major concepts： 1） maximizing the average density of a group of grids instead of minimizing the minimal square error which is applied in the K-means algorithm, and 2） using grid- clustering algorithms to thoroughly reformulate the object-driven assigning in the K-means algorithm into a new grid-driven assigning. Consequently, our proposed algorithm obtains an average speed-up about 10-100 times faster and produces better partitions than those by the K-means algorithm. Also, compared with the K-means algorithm, our proposed algorithm has ability to partition any dataset when the number of clusters is unknown. The effectiveness of our proposed algorithm has been demonstrated through successfully clustering datasets with different features in comparison with the other three typical clustering algorithms besides the K-means algorithm.

关键词：CLUSTERING grid-based algorithm K-means algorithm

分类号：TP391.41[自动化与计算机技术—计算机应用技术] O212.4[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

An unsupervised grid-based approach for clustering analysis 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

An unsupervised grid-based approach for clustering analysis 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索