非独立同分布下的K-Modes算法  

K-Modes algorithm within non-independent and identically distribution context

在线阅读下载全文

作  者:周慧鑫 姜合[1] 王艳梅 ZHOU Hui-xin;JIANG He;WANG Yan-mei(School of Computer Science and Technology,Qilu University of Technology(Shandong Academy of Sciences),Jinan 250353,China)

机构地区:[1]齐鲁工业大学(山东省科学院)计算机科学与技术学院,山东济南250353

出  处:《计算机工程与设计》2023年第1期182-187,共6页Computer Engineering and Design

基  金:国家自然科学青年基金项目(61502259)。

摘  要:传统的K-Modes算法中,初始聚类中心是随机选取的,聚类结果过分依赖初始聚类中心的选择,影响聚类效果。在很多K-Modes算法的研究中假设数据是独立同分布的,在现实的数据中,数据对象和属性之间是根据某些耦合关系彼此关联的,是非独立同分布的。针对这两方面问题,通过基于层次聚类进行预聚类的方法改进选取初始中心的方法,引入非独立同分布思想计算相异度量,进行实验验证。实验结果表明,通过改进初始中心的选取方法和相异度量的计算方法很好改进了K-Modes算法,提高了算法的聚类精度。In the traditional K-Modes algorithm,the initial clustering center is randomly selected,and the clustering results depend too much on the selection of the initial clustering center,which affects the clustering effect.In many studies of K-Modes algorithms,it is assumed that the data is independent and identically distributed.In real data,the data objects and attributes are related to each other according to some coupling relations,which is non independent and identically distributed.For these two aspects,the method of selecting the initial center was improved by pre-clustering based on hierarchical clustering,and the idea of non-independent identical distribution was introduced to calculate the dissimilarity measure,which was verified by experiments.The results show that by improving the selection method of initial center and the calculation method of dissimilarity measure,the K-Modes algorithm is well improved and the clustering accuracy of the algorithm is improved.

关 键 词:K-Modes算法 初始中心 独立同分布 非独立同分布 耦合关系 层次聚类 相异度度量 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象