一种面向混合型数据聚类的k-prototypes聚类算法  被引量:7

k-prototypes Clustering Algorithm for Mixed Data Clustering

在线阅读下载全文

作  者:贾子琪 宋玲[1,2] JIA Zi-qi;SONG Ling(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China;Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China)

机构地区:[1]广西大学计算机与电子信息学院,南宁530004 [2]广西多媒体通信与网络技术重点实验室,南宁530004

出  处:《小型微型计算机系统》2020年第9期1845-1852,共8页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61762030)资助;广西创新驱动重大专项项目(桂科AA17204017)资助;广西重点研发计划项目(桂科AB19110050,桂科AB18126094)资助。

摘  要:同时包含数值型和分类型数据的混合型数据集在实际应用中普遍存在.经典的k-prototypes算法通过人为设置参数γ来调节分类型数据和数值型数据之间的占比,γ对聚类结果影响很大.为了避免不同类型数据之间的特征转换和参数调整以及处理高维混合型数据聚类中的特征加权问题,提出了基于熵权的分类型相异度系数,量化的数值型相异度系数和适用于混合型数据聚类的混合型相异度系数.提出的相异度系数充分考虑了分类型特征值的重要性和数值型特征值的平均值,并具统一的准则,可以更客观的计算数据对象与簇之间的相异度.此外,将加权的混合型相异度系数应用到经典的k-prototypes算法中,提出了一种面向混合型数据聚类的k-prototypes聚类算法(KPMD).使用UCI真实数据集进行实验,结果验证了KPMD算法的有效性和鲁棒性.Mixed data sets containing both categorical and numerical data is common in practical applications.The classical k-prototypes algorithm adjusts the proportion between the categorical data and the numerical data by artificially setting the parameterγ,andγhas a great influence on the clustering result.In order to avoid the attribute conversion and parameter adjustment between categorical data and numerical data and to deal with the attribute weighting problem in the high-dimensional mixed data clustering process,we propose a categorical dissimilarity coefficient based on entropy weight;a quantitative numerical dissimilarity coefficient and a weighted mixed dissimilarity coefficient.In order to avoid attribute transformation and parameter adjustment between different types of data and to deal with attribute weighting in high-dimensional mixed data clustering,the categorical dissimilarity coefficient based on entropy weight,the quantized numerical dissimilarity coefficient and the mixed dissimilarity coefficient suitable for mixed data clustering are proposed.The proposed dissimilarity coefficient fully considers the importance of the categorical eigenvalues and the average value of the numerical eigenvalues,and has a unified criterion,which can more objectively calculate the dissimilarity between the data points and the clusters.In addition,the weighted mixed dissimilarity coefficient is applied to the classical k-prototypes algorithm,and a mixed data clustering algorithm(KPMD)is proposed.Experiments using UCI real data sets verify the effectiveness and robustness of the KPMD algorithm.

关 键 词:k-prototypes 混合型相异度系数  分类型数据 数值型数据 混合型数据 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象