基于三角不等式原理的K-means加速算法  被引量:4

K-means algorithm based on triangle inequality

在线阅读下载全文

作  者:常晋义[1] 何春霞[1] 

机构地区:[1]常熟理工学院计算机科学与工程系,江苏常熟215500

出  处:《计算机工程与设计》2007年第21期5094-5096,共3页Computer Engineering and Design

基  金:江苏省高校自然科学研究计划基金项目(03KJD51002)。

摘  要:K-means聚类算法简单快速,应用极为广泛,但是当处理海量数据时,时间效率仍然有待提高。当一个数据点远离一个聚类时,就没必要计算这两者之间的精确距离,以确定该数据点不属于这个类。应用三角不等式原理对其进行了改进,避免了冗余的距离计算。实验结果表明,改进之后在速度上有很大程度的提高,数据规模越大,改进效果越明显,且聚类效果保持了原算法的准确性。The K-means algorithm is by far the most widely used method for discovering clusters in data. However, when faced with large-scale data, the efficiency of the algorithm has need to be improved. If a point is far away from a center, it is not necessary to calculate the exact distance between the point and the center in order to know that the point should not be assigned to this center. The algorithm show how to accelerate it dramatically. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality. Experiments show that the new algorithm is more effective for datasets of more dimensions, and becomes more and more effective as the number of clusters increases. While still always get exactly the same result as the standard K-means algorithm.

关 键 词:K-均值算法 划分聚类 三角不等式原理 聚类分析 聚类算法 聚类效果 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象