基于平均差异度的改进k-prototypes聚类算法  被引量:4

Improved k-prototypes clustering algorithm based on average difference degree

在线阅读下载全文

作  者:石鸿雁[1] 徐明明 SHI Hong-yan;XU Ming-ming(School of Science,Shenyang University of Technology,Shenyang 110870,China)

机构地区:[1]沈阳工业大学理学院

出  处:《沈阳工业大学学报》2019年第5期555-559,共5页Journal of Shenyang University of Technology

基  金:国家自然科学基金资助项目(61074005)

摘  要:针对k-prototypes聚类算法随机选取初始聚类中心导致聚类结果不稳定,以及现有的大多数混合属性数据聚类算法聚类质量不高等问题,提出了基于平均差异度的改进k-prototypes聚类算法.通过利用平均差异度选取初始聚类中心,避免了初始聚类中心点选取的随机性,同时利用信息熵确定数值数据的属性权重,并对分类属性度量公式进行改进,给出了一种混合属性数据度量公式.结果表明,改进后的算法具有较高的准确率,能够有效处理混合属性数据.In order to solve the problem that the random selection of initial cluster centers for the k-prototypes clustering algorithm brings about unstable clustering results and that the clustering quality of most currently existing clustering algorithms for mixed attribute data is not high,an improved k-prototypes algorithm based on average difference degree was proposed.Through using the average difference degree,the initial clustering centers were selected to avoid the selection randomness of initial clustering center points.In addition,the attribute weights of numerical data were determined by the information entropy,the metric formula of categorical attribute was improved,and a metric formula for the mixed attribute data was given.The results show that the improved algorithm can achieve better accuracy and can effectively process the data of mixed attribute.

关 键 词:k-prototypes算法 聚类 初始聚类中心 混合属性数据 平均差异度 信息熵 属性权重 度量公式 

分 类 号:TP[自动化与计算机技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象