检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]陕西师范大学计算机科学学院,西安710062
出 处:《计算机工程》2014年第8期205-211,223,共8页Computer Engineering
基 金:国家自然科学基金资助项目(31372250);陕西省科技攻关计划基金资助项目(2013K12-03-24);中央高校基本科研业务费专项基金资助项目(GK201102007)
摘 要:传统K-means算法随机选取初始聚类中心,容易导致聚类结果不稳定,而优化初始聚类中心的K-means算法需要一定的参数选择,也会使聚类结果缺乏客观性。为此,根据样本空间分布紧密度信息,提出利用最小方差优化初始聚类中心的K-means算法。该算法运用样本空间分布信息,通过计算样本空间分布的方差得到样本紧密度信息,选择方差最小(即紧密度最高)且相距一定距离的样本作为初始聚类中心,实现优化的K-means聚类。在UCI机器学习数据库数据集和含有噪音的人工模拟数据集上的实验结果表明,该算法不仅能得到较好的聚类结果,且聚类结果稳定,对噪音具有较强的免疫性能。To overcome the deficiencies of traditional K-means algorithm whose clustering is dependent on the seeds chosen randomly and of the improved K-means algorithms whose clustering are unstable for the parameters selected arbitrarily,a novel K-means clustering algorithm is proposed in this paper.This new K-means algorithm adopts the pattern information of exemplars in a dataset,and computes the deviation for each sample.It uses the well known principle that the deviation of a sample addresses the intensive of exemplars around it.The less the deviation is,the more exemplars are intensively gathered around the related sample.The proposed K-means algorithm chooses the first K samples with the minimum deviation and far away from each other as the initial cluster centers to improve the performance of it.The proposed K-means algorithm is tested on UCI data sets and on synthetic datasets with some proportional noises.The experimental results demonstrate that the proposed novel K-means algorithm not only can achieve a very promising and stable clustering,but also get the immune property with noises in its clustering.
关 键 词:聚类 K-MEANS算法 方差 紧密度 初始聚类中心
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249