基于平均差异度的改进k-prototypes聚类算法被引量：4

Improved k-prototypes clustering algorithm based on average difference degree

作　　者：石鸿雁[1] 徐明明 SHI Hong-yan;XU Ming-ming(School of Science,Shenyang University of Technology,Shenyang 110870,China)

机构地区：[1]沈阳工业大学理学院

出　　处：《沈阳工业大学学报》2019年第5期555-559,共5页Journal of Shenyang University of Technology

基　　金：国家自然科学基金资助项目(61074005)

摘　　要：针对k-prototypes聚类算法随机选取初始聚类中心导致聚类结果不稳定,以及现有的大多数混合属性数据聚类算法聚类质量不高等问题,提出了基于平均差异度的改进k-prototypes聚类算法.通过利用平均差异度选取初始聚类中心,避免了初始聚类中心点选取的随机性,同时利用信息熵确定数值数据的属性权重,并对分类属性度量公式进行改进,给出了一种混合属性数据度量公式.结果表明,改进后的算法具有较高的准确率,能够有效处理混合属性数据.In order to solve the problem that the random selection of initial cluster centers for the k-prototypes clustering algorithm brings about unstable clustering results and that the clustering quality of most currently existing clustering algorithms for mixed attribute data is not high,an improved k-prototypes algorithm based on average difference degree was proposed.Through using the average difference degree,the initial clustering centers were selected to avoid the selection randomness of initial clustering center points.In addition,the attribute weights of numerical data were determined by the information entropy,the metric formula of categorical attribute was improved,and a metric formula for the mixed attribute data was given.The results show that the improved algorithm can achieve better accuracy and can effectively process the data of mixed attribute.

关键词：k-prototypes算法聚类初始聚类中心混合属性数据平均差异度信息熵属性权重度量公式

分类号：TP[自动化与计算机技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于平均差异度的改进k-prototypes聚类算法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于平均差异度的改进k-prototypes聚类算法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于平均差异度的改进k-prototypes聚类算法被引量：4