类属型数据核子空间聚类算法  被引量:5

Kernel Subspace Clustering Algorithm for Categorical Data

在线阅读下载全文

作  者:徐鲲鹏 陈黎飞 孙浩军[3] 王备战[4] XU Kun-Peng;CHEN Li-Fei;SUN Hao-Jun;WANG Bei-Zhan(College of Mathematics and Informatics,Fujian Normal University,Fuzhou 350117,China;Digital Fujian Internet-of-Things Laboratory of Environmental Monitoring(Fujian Normal University),Fuzhou 350117,China;College of Engineering,Shantou University,Shantou 515063,China;College of Software,Xiamen University,Xiamen 361005,China)

机构地区:[1]福建师范大学数学与信息学院,福建福州350117 [2]数字福建环境监测物联网实验室(福建师范大学),福建福州350117 [3]汕头大学工学院,广东汕头515063 [4]厦门大学软件学院,福建厦门361005

出  处:《软件学报》2020年第11期3492-3505,共14页Journal of Software

基  金:国家自然科学基金(U1805263,61672157);福建省科技厅项目(JK2017007);福建师范大学创新团队项目(IRTL1704)。

摘  要:现有的类属型数据子空间聚类方法大多基于特征间相互独立假设,未考虑属性间存在的线性或非线性相关性.提出一种类属型数据核子空间聚类方法.首先引入原作用于连续型数据的核函数将类属型数据投影到核空间,定义了核空间中特征加权的类属型数据相似性度量.其次,基于该度量推导了类属型数据核子空间聚类目标函数,并提出一种高效求解该目标函数的优化方法.最后,定义了一种类属型数据核子空间聚类算法.该算法不仅在非线性空间中考虑了属性间的关系,而且在聚类过程中赋予每个属性衡量其与簇类相关程度的特征权重,实现了类属型属性的嵌入式特征选择.还定义了一个聚类有效性指标,以评价类属型数据聚类结果的质量.在合成数据和实际数据集上的实验结果表明,与现有子空间聚类算法相比,核子空间聚类算法可以发掘类属型属性间的非线性关系,并有效提高了聚类结果的质量.Currently,the mainstream subspace clustering methods for categorical data are dependent on linear similarity measure and the relationship between attributes is overlooked.In this study,an approach is proposed for clustering categorical data with a novel kernel soft feature-selection scheme.First,categorical data is projected into the high-dimensional kernel space by introducing the kernel function and the similarity measure of categorical data in kernel subspace is given.Based on the measure,the kernel subspace clustering objective function is derived and an optimization method is proposed to solve the objective function.At last,kernel subspace clustering algorithm for categorical data is proposed,the algorithm considers the relationship between the attributes and each attribute assigned with weights measuring its degree of relevance to the clusters,enabling automatic feature selection during the clustering process.A cluster validity index is also defined to evaluate the categorical clusters.Experimental results carried out on some synthetic datasets and real-world datasets demonstrate that the proposed method effectively excavates the nonlinear relationship among attributes and improves the performance and efficiency of clustering.

关 键 词:聚类 类属型数据 核方法 非线性度量 子空间 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象