基于互信息与贝叶斯信念网络的关系层次距离混合聚类算法  被引量:2

Relation Hierarchical Distance Clustering of Mixed Data Based on Mutual Information and Bayesian Networks

在线阅读下载全文

作  者:蔡金成 孙浩军[1] CAI Jincheng;SUN Haojun(College of Engineering,Shantou University,Guangdong, Chin)

机构地区:[1]汕头大学工学院,广东汕头515063

出  处:《汕头大学学报(自然科学版)》2018年第2期3-12,共10页Journal of Shantou University:Natural Science Edition

基  金:国家自然科学基金资助项目(61170130)

摘  要:聚类是数据挖掘中重要的功能算法,其主要的功能是发现数据中潜在的知识.目前文献发表的聚类算法多数仅限于处理单一数值型数据或者分类型数据,其主要原因是含有多种类型的混合型数据间的相似性很难度量.本文提出了一种混合数据相似性度量方法:对于分类型属性,利用互信息构建贝叶斯信念网络,利用贝叶斯信念网络构建关系层次,继而为层次附上距离,形成关系层次距离,而对于数值型属性则利用标准化的曼哈顿距离来度量其相似性,最后结合分类型属性与数值型属性来对整个数据集进行相似性的度量.在此基础上,设计实现了用于混合型数据聚类算法CRHD,并通过UCI中的多个数据集和已有算法进行仿真实验对比,证明了CRHD算法的有效性.Clustering is one of the most important function algorithm in data mining, and its main function is to discover the hidden knowledge in a dataset. Most published articles about clustering algorithm by far are limited in either categorical or numeric valued data, because the similarity of mixed data is hard to measure. In this paper, a measuring similarity method is proposed for mixed data. For categorical attributes, Bayesian network is constructed by using mutual information. Relation hierarchical is constructed by using Bayesian network, and distance is attached to hierarchical to form relation hierarchical distance. For numeric attributes, similarity by standardized Manhattan distance is measured, and the similarity of dataset combining categorical and numeric attributes is measured. On this basis, a clustering algorithm CRHD for mixed data is designed and realized. Several data sets in UCI are compared with some traditional clustering algorithms in simulation experiments, and demonstrate the effectiveness of the CRHD.

关 键 词:聚类 混合数据 互信息 贝叶斯信念网络 层次距离 关系层次距离 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象