检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李顺勇[1] 张苗苗 Li Shunyong;Zhang Miaomiao(School of Mathematical Sciences,Shanxi University,Taiyuan 030006,Shanxi,China)
出 处:《计算机应用与软件》2019年第1期284-290,共7页Computer Applications and Software
基 金:国家自然科学基金项目(61573229);山西省基础研究计划项目(201701D121004);山西省回国留学人员科研项目(2017-020);山西省高等学校教学改革创新项目(J2017002)
摘 要:混合数据的聚类过程中通常面临一个不可回避的问题:聚类个数的确定。基于Liang k-prototype算法引入属性权重,重新定义混合数据缺失某类的类间熵和(SBAE_M)、有效性指标(CUM)及相异性度量。提出一种带权的混合数据聚类个数确定算法。该算法的基本思想是:用newk-prototype算法将混合数据进行聚类,计算其聚类结果的CUM及SBAE_M,将最坏的类剔除,并将该类中的对象用新的相异性度量进行重新分配,CUM最大时包含的类别数即为聚类个数。在5个UCI数据集上验证了该算法的有效性。Determining the number of clusters is an unavoidable problem in the clustering process of mixed data.This paper introduced attribute weight on the basis of Liang k-prototype algorithm,redefined the sum of between-cluster entropies in absence of a cluster(SBAE_M),the validity index(CUM)and the dissimilarity measure of mixed data,and proposedaweighted algorithm for determining the number of mixed data clustering.New k-prototype algorithm was used to cluster the mixed data.CUM and SBAE_M of the clustering results were calculated and the worst class was eliminated.The objects in this class were reassigned with new dissimilarity measure.The number of categoriesincluding at the maximum of CUM was the number of clusters.The effectiveness of the improved k-prototype clustering algorithm was verified on five data sets from UCI.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.16.147.165