采用离群点检测技术的混合型数据聚类初始化方法被引量：8

Mixed data clustering initialization method using outlier detection technology

作　　者：杨志勇江峰[1] 于旭[1] 杜军威[1] YANG Zhiyong;JIANG Feng;YU Xu;DU Junwei(School of Information Science&Technology,Qingdao University of Science and Technology,Qingdao 266100,China)

机构地区：[1]青岛科技大学信息科学技术学院,山东青岛266100

出　　处：《智能系统学报》2023年第1期56-65,共10页CAAI Transactions on Intelligent Systems

基　　金：国家自然科学基金项目(61973180,61671261);山东省自然科学基金项目(ZR2021MF092,ZR2022MF326)。

摘　　要：近年来,混合型数据的聚类问题受到广泛关注。作为处理混合型数据的一种有效方法,K-prototype聚类算法在初始化聚类中心时通常采用随机选取的策略,然而这种策略在很多实际应用中难以保证聚类结果的质量。针对上述问题,采用基于离群点检测的策略来为K-prototype算法选择初始中心,并提出一种新的混合型数据聚类初始化算法(initialization of K-prototype clustering based on outlier detection and density,IKP-ODD)。给定一个候选对象,IKP-ODD通过计算其距离离群因子、加权密度以及与已有初始中心之间的加权距离来判断候选对象是否是一个初始中心。IKP-ODD通过采用距离离群因子和加权密度,防止选择离群点作为初始中心。在计算对象的加权密度以及对象之间的加权距离时,采用邻域粗糙集中的粒度邻域熵来计算每一个属性的重要性,并根据属性重要性的大小为不同属性赋予不同的权重,有效地反映不同属性之间的差异性。在多个UCI数据集上的实验表明,相对于现有的初始化方法,IKP-ODD能够更好地解决K-prototype聚类的初始化问题。In recent years,the clustering problem of mixed-type data has received wide attention.As an effective method to process mixed-type data,K-prototype clustering algorithm usually uses the strategy of random selection to initialize cluster centers.However,it is difficult to guarantee the quality of clustering results in many practical applications.To solve above problem,in this paper we select initial centers for K-prototype algorithm based on outlier detection,and present a new initialization algorithm(Initialization of K-prototype Clustering Based on Outlier Detection and Density,denoted as IKP-ODD)for mixed-type data clustering.Given a candidate object,IKP-ODD determines whether the candidate object is an initial center by calculating its distance outlier factor,weighted density and weighted distances from existing initial centers.IKP-ODD prevents outliers from being selected as initial centers by using distance outlier factor and weighted density.When calculating the weighted densities of objects and the weighted distances between objects,we use the granular neighborhood entropy in neighborhood rough sets to calculate the significance of each attribute,and assign different weights to different attributes according to the significances of attributes,which can effectively reflect the difference between different attributes.Experiments on several UCI datasets show that IKP-ODD performs better than the existing initialization methods when solving the initialization problem of K-prototype clustering.

关键词：聚类初始化混合型数据离群点检测邻域粗糙集粒度邻域熵距离离群因子加权密度加权距离

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

采用离群点检测技术的混合型数据聚类初始化方法被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

采用离群点检测技术的混合型数据聚类初始化方法 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

采用离群点检测技术的混合型数据聚类初始化方法被引量：8