检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贾子琪 宋玲[1,2] JIA Zi-qi;SONG Ling(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China;Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China)
机构地区:[1]广西大学计算机与电子信息学院,南宁530004 [2]广西多媒体通信与网络技术重点实验室,南宁530004
出 处:《小型微型计算机系统》2020年第9期1845-1852,共8页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61762030)资助;广西创新驱动重大专项项目(桂科AA17204017)资助;广西重点研发计划项目(桂科AB19110050,桂科AB18126094)资助。
摘 要:同时包含数值型和分类型数据的混合型数据集在实际应用中普遍存在.经典的k-prototypes算法通过人为设置参数γ来调节分类型数据和数值型数据之间的占比,γ对聚类结果影响很大.为了避免不同类型数据之间的特征转换和参数调整以及处理高维混合型数据聚类中的特征加权问题,提出了基于熵权的分类型相异度系数,量化的数值型相异度系数和适用于混合型数据聚类的混合型相异度系数.提出的相异度系数充分考虑了分类型特征值的重要性和数值型特征值的平均值,并具统一的准则,可以更客观的计算数据对象与簇之间的相异度.此外,将加权的混合型相异度系数应用到经典的k-prototypes算法中,提出了一种面向混合型数据聚类的k-prototypes聚类算法(KPMD).使用UCI真实数据集进行实验,结果验证了KPMD算法的有效性和鲁棒性.Mixed data sets containing both categorical and numerical data is common in practical applications.The classical k-prototypes algorithm adjusts the proportion between the categorical data and the numerical data by artificially setting the parameterγ,andγhas a great influence on the clustering result.In order to avoid the attribute conversion and parameter adjustment between categorical data and numerical data and to deal with the attribute weighting problem in the high-dimensional mixed data clustering process,we propose a categorical dissimilarity coefficient based on entropy weight;a quantitative numerical dissimilarity coefficient and a weighted mixed dissimilarity coefficient.In order to avoid attribute transformation and parameter adjustment between different types of data and to deal with attribute weighting in high-dimensional mixed data clustering,the categorical dissimilarity coefficient based on entropy weight,the quantized numerical dissimilarity coefficient and the mixed dissimilarity coefficient suitable for mixed data clustering are proposed.The proposed dissimilarity coefficient fully considers the importance of the categorical eigenvalues and the average value of the numerical eigenvalues,and has a unified criterion,which can more objectively calculate the dissimilarity between the data points and the clusters.In addition,the weighted mixed dissimilarity coefficient is applied to the classical k-prototypes algorithm,and a mixed data clustering algorithm(KPMD)is proposed.Experiments using UCI real data sets verify the effectiveness and robustness of the KPMD algorithm.
关 键 词:k-prototypes 混合型相异度系数 熵 分类型数据 数值型数据 混合型数据
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.79