一种基于图划分的混合属性数据聚类算法  被引量:2

A GRAPH PARTITION-BASED CLUSTERING ALGORITHM FOR MIXED ATTRIBUTES DATA

在线阅读下载全文

作  者:黄树成[1] 李甜[1] 沙爱晖[1] 

机构地区:[1]江苏科技大学计算机科学与工程学院,江苏镇江212003

出  处:《计算机应用与软件》2013年第7期11-13,135,共4页Computer Applications and Software

基  金:国家自然科学基金项目(70871057);江苏省高校自然科学研究计划项目(2008DX065J)

摘  要:实际应用中存在着大量同时具有数值型和符号型属性的混合属性数据,研究混合属性数据的聚类具有重要意义。经典聚类算法仅仅处理数值型数据或符号型属性数据,对混合属性数据往往无效。现有混合属性数据聚类算法分别将数值型属性和符号型属性数据单独计算,忽视了两种属性之间的相关性,聚类效果不理想。提出一种基于图划分的混合属性数据聚类算法。算法将一行属性值定义为一个图节点,计算图节点的相似性,采用一种自适应调节属性权重的方法,将数值和符号属性的相似性统一成一个互联合相似度矩阵。用图划分方法对数据进行聚类划分,通过迭代寻优的方法调整数据之间的契合度,从而求得类内相似度最大并寻得最优解。实验结果表明,混合属性聚类算法与其他方法相比具有明显的优势。There are a large number of data with mixed attributes of both numeric and symbolic in practical applications. The study of mixed attributes data clustering is of great significance. Classical clustering algorithms can process the data with either numerical or symbolic attribute data only, but are usually invalid on the data with mixed attributes. Existing clustering algorithms for mixed attributes data separately calculate the numerical attribute data and symbolic attributes data, but ignore the correlation between the two attributes, therefore the clustering effect is not good enough. In this paper we present a graph partition-based mixed attributes data clustering algorithm. It defines the attribute value of every row as one graph node, and calculates the similarity of these graph nodes ; By using an adaptive attribute weighting method, it unifies the similarity of numeric and symbolic attributes into a mutually combined similarity matrix; It uses graph partition method to make cluster partitioning of the data, applying iterative optimisation method to adjust the fitness between the data, so as to get the maximum intra-class similarity, and to reach the optimal solution. Experimental resuhs show that the proposed mixed attributes data clustering algorithm has noticeable advantage over other approaches.

关 键 词:混合属性数据 图划分 谱聚类 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象