检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:蔡金成 孙浩军[1] CAI Jincheng;SUN Haojun(College of Engineering,Shantou University,Guangdong, Chin)
出 处:《汕头大学学报(自然科学版)》2018年第2期3-12,共10页Journal of Shantou University:Natural Science Edition
基 金:国家自然科学基金资助项目(61170130)
摘 要:聚类是数据挖掘中重要的功能算法,其主要的功能是发现数据中潜在的知识.目前文献发表的聚类算法多数仅限于处理单一数值型数据或者分类型数据,其主要原因是含有多种类型的混合型数据间的相似性很难度量.本文提出了一种混合数据相似性度量方法:对于分类型属性,利用互信息构建贝叶斯信念网络,利用贝叶斯信念网络构建关系层次,继而为层次附上距离,形成关系层次距离,而对于数值型属性则利用标准化的曼哈顿距离来度量其相似性,最后结合分类型属性与数值型属性来对整个数据集进行相似性的度量.在此基础上,设计实现了用于混合型数据聚类算法CRHD,并通过UCI中的多个数据集和已有算法进行仿真实验对比,证明了CRHD算法的有效性.Clustering is one of the most important function algorithm in data mining, and its main function is to discover the hidden knowledge in a dataset. Most published articles about clustering algorithm by far are limited in either categorical or numeric valued data, because the similarity of mixed data is hard to measure. In this paper, a measuring similarity method is proposed for mixed data. For categorical attributes, Bayesian network is constructed by using mutual information. Relation hierarchical is constructed by using Bayesian network, and distance is attached to hierarchical to form relation hierarchical distance. For numeric attributes, similarity by standardized Manhattan distance is measured, and the similarity of dataset combining categorical and numeric attributes is measured. On this basis, a clustering algorithm CRHD for mixed data is designed and realized. Several data sets in UCI are compared with some traditional clustering algorithms in simulation experiments, and demonstrate the effectiveness of the CRHD.
关 键 词:聚类 混合数据 互信息 贝叶斯信念网络 层次距离 关系层次距离
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117