基于新的距离度量的异构属性数据子空间聚类被引量：3

Subspace Clustering of Heterogeneous-attribute Data Based on a New Distance Metric

作　　者：邓秀勤[1] 郑丽苹张逸群刘冬冬 DENG Xiuqin;ZHENG Liping;ZHANG Yiqun;LIU Dongdong(School of Mathematics and Statistics,Guangdong University of Technology,Guangzhou 510520,China;School of Computer Science and Technology,Guangdong University of Technology,Guangzhou 510006,China)

机构地区：[1]广东工业大学数学与统计学院,广东广州510520 [2]广东工业大学计算机学院,广东广州510006

出　　处：《郑州大学学报（工学版）》2023年第2期53-60,共8页Journal of Zhengzhou University（Engineering Science）

基　　金：国家自然科学基金资助项目(12101136);广东省自然科学基金资助项目(2022A1515011592);广东省研究生教育创新计划项目(2021SFKC030)。

摘　　要：真实数据集中往往包含分类属性和数值属性,其中分类属性可分为有序属性和标称属性,同时具有分类属性和数值属性的数据集可称为异构属性数据。针对现有异构属性数据距离度量不区分分类属性中的有序属性导致信息缺失、聚类效果不理想这一问题,提出了一种基于新的距离度量的异构属性数据子空间聚类算法。首先,总结了现有的异构属性数据距离度量的思路和区分有序属性的解决方案;其次,利用不同属性的数据特征分别定义了有序属性、标称属性和数值属性下的属性值之间的距离公式;再次,利用簇间差异和簇内距离这2个因素分别给出了不同属性在聚类过程中的动态加权方案;最后,联立距离公式和加权机制得到了可适用于异构属性数据的距离度量,进而设计了一种基于新的距离度量的异构属性数据子空间聚类算法。由于该算法既统一了异构属性数据的距离度量又能在子空间中进行簇搜索,因此该算法能在异构属性数据集上取得良好的聚类效果,在11个真实数据集上的对比实验结果验证了此算法的有效性。Real datasets often contain categorical and numerical attributes, and categorical attributes can be divided into ordinal and nominal attributes. Datasets with both categorical and numerical attributes can be called heterogeneous-attribute data. To solve the problem that the existing distance metrics of heterogeneous-attribute data can not distinguish ordinal attributes in the categorical attributes resulting in missing information and poor clustering effect, a new subspace clustering algorithm based on distance metric was proposed. Firstly, this study summarized the existing progress of distance metric of heterogeneous-attribute data and the solutions to distinguish ordinal attribute. Then the distance formulas were defined for the attribute values of ordinal, nominal, and numerical attributes from the perspective of their natural characteristics. Subsequently, a dynamic weighting scheme was proposed to weight different attributes according to their contributed inter-and intra-cluster distances during clustering. Finally, the distance formula and dynamic weighting scheme were combined to form the distance metric applicable to heterogeneous-attribute data, and a subspace clustering algorithm for heterogeneous-attribute data was thus proposed. Because the algorithm unified the distance metric of heterogeneous-attribute data and could search clusters in subspace, it could achieve good clustering effect on heterogeneous-attribute data. Experimental results on 11 real data sets showed the effectiveness of the algorithm.

关键词：异构属性数据有序属性距离度量子空间聚类算法动态权重

分类号：O235[理学—运筹学与控制论] TP311.13[理学—数学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于新的距离度量的异构属性数据子空间聚类被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于新的距离度量的异构属性数据子空间聚类 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于新的距离度量的异构属性数据子空间聚类被引量：3