检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]计算智能与中文信息处理教育部重点实验室(山西大学),太原030006
出 处:《计算机研究与发展》2010年第10期1749-1755,共7页Journal of Computer Research and Development
基 金:国家"八六三"高技术研究发展计划基金项目(2007AA01Z165);国家自然科学基金项目(60773133;70971080);山西省自然科学基金项目(2008011038);山西省高校科技开发项目(2007103)~~
摘 要:传统的K-Modes聚类算法采用简单的0-1匹配差异方法来计算同一分类属性下两个属性值之间的距离,没有充分考虑其相似性.对此,基于粗糙集理论,提出了一种新的距离度量.该距离度量在度量同一分类属性下两个属性值之间的差异时,克服了简单0-1匹配差异法的不足,既考虑了它们本身的异同,又考虑了其他相关分类属性对它们的区分性.并将提出的距离度量应用于传统K-Modes聚类算法中.通过与基于其他距离度量的K-Modes聚类算法进行实验比较,结果表明新的距离度量是更加有效的.The leading partitional clustering technique,K-Modes,is one of the most computationally efficient clustering methods for categorical data.In the traditional K-Modes algorithm,the simple matching dissimilarity measure is used to compute the distance between two values of the same categorical attributes.This compares two categorical values directly and results in either a difference of zero when the two values are identical or one if otherwise.However,the similarity between categorical values is not considered.In this paper,a new distance measure based on rough set theory is proposed,which overcomes the shortage of the simple matching dissimilarity measure and is used along with the traditional K-Modes clustering algorithm.While computing the distance between two values of the same categorical attributes,the new distance measure takes into account not only their difference but also discernibility of other relational categorical attributes to them.The time complexity of the modified K-Modes clustering algorithm is linear with respect to the number of data objects which can be applied for large data sets.The performance of the K-Modes algorithm with the new distance measure is tested on real world data sets.Comparisons with the K-Modes algorithm based on many different distance measures illustrate the effectiveness of the new distance measure.
关 键 词:聚类算法 分类属性数据 粗糙集 粗糙隶属度 距离度量
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117