一种基于相关系数的模糊C-均值聚类算法  被引量:1

A Fuzzy C-Means Clustering Algorithm Based on Correlation Coefficient

在线阅读下载全文

作  者:钟毅[1] ZHONG Yi(China Union Pay Co., Ltd., Shanghai 201201, China)

机构地区:[1]中国银联股份有限公司,上海201201

出  处:《软件产业与工程》2016年第6期50-53,共4页

摘  要:针对传统模糊C-均值聚类算法同等对待每个属性的局限性和初始聚类中心选择的随机性,提出了一种基于相关系数优化的模糊C-均值聚类算法。首先,该算法通过计算离散系数和信息熵来确定每个属性的权重,从而强化了重要属性对聚类过程的影响,削弱了冗余属性的作用,改善了聚类效果;其次,采用相关系数和密度函数来确定样本点的密度,从而突出了样本点在同一类别中的作用;再次,将归一化后的样本点密度作为每个样本的权值;最后,由相关系数及样本点密度确定初始聚类中心。实验结果表明,该算法比传统的FCM算法具有更好的聚类效果。In the view of limitations of equal treatment of each feature and the randomness of the initial clustering center selection for the traditional Fuzzy C-Means clustering algorithm, an improved fuzzy C-means clustering algorithm based on correlation coefficient is proposed. Firstly, the algorithm is to determine the feature weight by computing the discrete coefficient and information entropy of the data set, so that it strengthen the important feature's effect and weaken the redundant feature's effect in the procedure of clustering. Secondly, the correlation coefficient and density function is adopted to calculate the density of sample point, which highlighted the role of sample point in the same category. Thirdly, it uses the density of sample point after normalization as a weight for each data point. Finally, by using correlation coefficient and the density of sample point, the initial clustering center is determined. The experimental results show that this algorithm has better clusterinq effect than the traditional FCM alqorithm.

关 键 词:模糊C均值算法 离散系数 信息熵 属性权重 相关系数 密度函数 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象