检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]常熟理工学院计算机科学与工程系,江苏常熟215500
出 处:《计算机工程与设计》2007年第21期5094-5096,共3页Computer Engineering and Design
基 金:江苏省高校自然科学研究计划基金项目(03KJD51002)。
摘 要:K-means聚类算法简单快速,应用极为广泛,但是当处理海量数据时,时间效率仍然有待提高。当一个数据点远离一个聚类时,就没必要计算这两者之间的精确距离,以确定该数据点不属于这个类。应用三角不等式原理对其进行了改进,避免了冗余的距离计算。实验结果表明,改进之后在速度上有很大程度的提高,数据规模越大,改进效果越明显,且聚类效果保持了原算法的准确性。The K-means algorithm is by far the most widely used method for discovering clusters in data. However, when faced with large-scale data, the efficiency of the algorithm has need to be improved. If a point is far away from a center, it is not necessary to calculate the exact distance between the point and the center in order to know that the point should not be assigned to this center. The algorithm show how to accelerate it dramatically. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality. Experiments show that the new algorithm is more effective for datasets of more dimensions, and becomes more and more effective as the number of clusters increases. While still always get exactly the same result as the standard K-means algorithm.
关 键 词:K-均值算法 划分聚类 三角不等式原理 聚类分析 聚类算法 聚类效果
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222