检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨志勇 江峰[1] 于旭[1] 杜军威[1] YANG Zhiyong;JIANG Feng;YU Xu;DU Junwei(School of Information Science&Technology,Qingdao University of Science and Technology,Qingdao 266100,China)
机构地区:[1]青岛科技大学信息科学技术学院,山东青岛266100
出 处:《智能系统学报》2023年第1期56-65,共10页CAAI Transactions on Intelligent Systems
基 金:国家自然科学基金项目(61973180,61671261);山东省自然科学基金项目(ZR2021MF092,ZR2022MF326)。
摘 要:近年来,混合型数据的聚类问题受到广泛关注。作为处理混合型数据的一种有效方法,K-prototype聚类算法在初始化聚类中心时通常采用随机选取的策略,然而这种策略在很多实际应用中难以保证聚类结果的质量。针对上述问题,采用基于离群点检测的策略来为K-prototype算法选择初始中心,并提出一种新的混合型数据聚类初始化算法(initialization of K-prototype clustering based on outlier detection and density,IKP-ODD)。给定一个候选对象,IKP-ODD通过计算其距离离群因子、加权密度以及与已有初始中心之间的加权距离来判断候选对象是否是一个初始中心。IKP-ODD通过采用距离离群因子和加权密度,防止选择离群点作为初始中心。在计算对象的加权密度以及对象之间的加权距离时,采用邻域粗糙集中的粒度邻域熵来计算每一个属性的重要性,并根据属性重要性的大小为不同属性赋予不同的权重,有效地反映不同属性之间的差异性。在多个UCI数据集上的实验表明,相对于现有的初始化方法,IKP-ODD能够更好地解决K-prototype聚类的初始化问题。In recent years,the clustering problem of mixed-type data has received wide attention.As an effective method to process mixed-type data,K-prototype clustering algorithm usually uses the strategy of random selection to initialize cluster centers.However,it is difficult to guarantee the quality of clustering results in many practical applications.To solve above problem,in this paper we select initial centers for K-prototype algorithm based on outlier detection,and present a new initialization algorithm(Initialization of K-prototype Clustering Based on Outlier Detection and Density,denoted as IKP-ODD)for mixed-type data clustering.Given a candidate object,IKP-ODD determines whether the candidate object is an initial center by calculating its distance outlier factor,weighted density and weighted distances from existing initial centers.IKP-ODD prevents outliers from being selected as initial centers by using distance outlier factor and weighted density.When calculating the weighted densities of objects and the weighted distances between objects,we use the granular neighborhood entropy in neighborhood rough sets to calculate the significance of each attribute,and assign different weights to different attributes according to the significances of attributes,which can effectively reflect the difference between different attributes.Experiments on several UCI datasets show that IKP-ODD performs better than the existing initialization methods when solving the initialization problem of K-prototype clustering.
关 键 词:聚类初始化 混合型数据 离群点检测 邻域粗糙集 粒度邻域熵 距离离群因子 加权密度 加权距离
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49