基于新的距离度量的K-Modes聚类算法  被引量:46

K-Modes Clustering Algorithm Based on a New Distance Measure

在线阅读下载全文

作  者:梁吉业[1,2] 白亮[1] 曹付元[1,2] 

机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]计算智能与中文信息处理教育部重点实验室(山西大学),太原030006

出  处:《计算机研究与发展》2010年第10期1749-1755,共7页Journal of Computer Research and Development

基  金:国家"八六三"高技术研究发展计划基金项目(2007AA01Z165);国家自然科学基金项目(60773133;70971080);山西省自然科学基金项目(2008011038);山西省高校科技开发项目(2007103)~~

摘  要:传统的K-Modes聚类算法采用简单的0-1匹配差异方法来计算同一分类属性下两个属性值之间的距离,没有充分考虑其相似性.对此,基于粗糙集理论,提出了一种新的距离度量.该距离度量在度量同一分类属性下两个属性值之间的差异时,克服了简单0-1匹配差异法的不足,既考虑了它们本身的异同,又考虑了其他相关分类属性对它们的区分性.并将提出的距离度量应用于传统K-Modes聚类算法中.通过与基于其他距离度量的K-Modes聚类算法进行实验比较,结果表明新的距离度量是更加有效的.The leading partitional clustering technique,K-Modes,is one of the most computationally efficient clustering methods for categorical data.In the traditional K-Modes algorithm,the simple matching dissimilarity measure is used to compute the distance between two values of the same categorical attributes.This compares two categorical values directly and results in either a difference of zero when the two values are identical or one if otherwise.However,the similarity between categorical values is not considered.In this paper,a new distance measure based on rough set theory is proposed,which overcomes the shortage of the simple matching dissimilarity measure and is used along with the traditional K-Modes clustering algorithm.While computing the distance between two values of the same categorical attributes,the new distance measure takes into account not only their difference but also discernibility of other relational categorical attributes to them.The time complexity of the modified K-Modes clustering algorithm is linear with respect to the number of data objects which can be applied for large data sets.The performance of the K-Modes algorithm with the new distance measure is tested on real world data sets.Comparisons with the K-Modes algorithm based on many different distance measures illustrate the effectiveness of the new distance measure.

关 键 词:聚类算法 分类属性数据 粗糙集 粗糙隶属度 距离度量 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象