检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:古凌岚[1]
机构地区:[1]广东轻工职业技术学院计算机工程系,广东广州510300
出 处:《计算机工程与设计》2014年第6期2183-2187,共5页Computer Engineering and Design
摘 要:为解决传统模糊C-均值算法无法适应大规模数据集体量大、冗余属性的问题,提出了一种面向大数据集的混合聚类算法。将大数据集划分为多个子集,对各子集进行聚类,通过合并得到最终聚类结果。对于子集采用基于基因表达式编程(GEP)和模糊C-均值的混合算法进行聚类,以改善聚类的质量和效率;基于相似性选取初始聚类中心,使用信息熵体现属性重要程度,从而进一步优化聚类性能。实验仿真及分析结果表明,该算法具有较好地全局收敛性,得到的聚类效果也更好。To solve the problem that traditional fuzzy C-means algorithm could not adopt to large scale datasets with large size and redundant attribute,a hybrid clustering algorithm for large data sets was proposed.The large data sets were divided into subsets,and each subset was first clustered,and then final clustering result was obtained by merging.The subset was clustered by a mixed algorithm based on gene expression programming (GEP) and fuzzy C-means.The quality and efficiency of clustering was improved.While initial clustering center was selected based on similarity,and the importance of data attribute was embedded by information entropy,thereby the clustering performance was optimized further.Simulation experiments showed that the algorithm had better global convergence,and could get even better clustering result.
关 键 词:大数据集 模糊C-均值 基因表达式编程 属性信息熵 聚类
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222