检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:钟毅[1] ZHONG Yi(China Union Pay Co., Ltd., Shanghai 201201, China)
出 处:《软件产业与工程》2016年第6期50-53,共4页
摘 要:针对传统模糊C-均值聚类算法同等对待每个属性的局限性和初始聚类中心选择的随机性,提出了一种基于相关系数优化的模糊C-均值聚类算法。首先,该算法通过计算离散系数和信息熵来确定每个属性的权重,从而强化了重要属性对聚类过程的影响,削弱了冗余属性的作用,改善了聚类效果;其次,采用相关系数和密度函数来确定样本点的密度,从而突出了样本点在同一类别中的作用;再次,将归一化后的样本点密度作为每个样本的权值;最后,由相关系数及样本点密度确定初始聚类中心。实验结果表明,该算法比传统的FCM算法具有更好的聚类效果。In the view of limitations of equal treatment of each feature and the randomness of the initial clustering center selection for the traditional Fuzzy C-Means clustering algorithm, an improved fuzzy C-means clustering algorithm based on correlation coefficient is proposed. Firstly, the algorithm is to determine the feature weight by computing the discrete coefficient and information entropy of the data set, so that it strengthen the important feature's effect and weaken the redundant feature's effect in the procedure of clustering. Secondly, the correlation coefficient and density function is adopted to calculate the density of sample point, which highlighted the role of sample point in the same category. Thirdly, it uses the density of sample point after normalization as a weight for each data point. Finally, by using correlation coefficient and the density of sample point, the initial clustering center is determined. The experimental results show that this algorithm has better clusterinq effect than the traditional FCM alqorithm.
关 键 词:模糊C均值算法 离散系数 信息熵 属性权重 相关系数 密度函数
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.143.211.215