检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邓秀勤[1] 郑丽苹 张逸群 刘冬冬 DENG Xiuqin;ZHENG Liping;ZHANG Yiqun;LIU Dongdong(School of Mathematics and Statistics,Guangdong University of Technology,Guangzhou 510520,China;School of Computer Science and Technology,Guangdong University of Technology,Guangzhou 510006,China)
机构地区:[1]广东工业大学数学与统计学院,广东广州510520 [2]广东工业大学计算机学院,广东广州510006
出 处:《郑州大学学报(工学版)》2023年第2期53-60,共8页Journal of Zhengzhou University(Engineering Science)
基 金:国家自然科学基金资助项目(12101136);广东省自然科学基金资助项目(2022A1515011592);广东省研究生教育创新计划项目(2021SFKC030)。
摘 要:真实数据集中往往包含分类属性和数值属性,其中分类属性可分为有序属性和标称属性,同时具有分类属性和数值属性的数据集可称为异构属性数据。针对现有异构属性数据距离度量不区分分类属性中的有序属性导致信息缺失、聚类效果不理想这一问题,提出了一种基于新的距离度量的异构属性数据子空间聚类算法。首先,总结了现有的异构属性数据距离度量的思路和区分有序属性的解决方案;其次,利用不同属性的数据特征分别定义了有序属性、标称属性和数值属性下的属性值之间的距离公式;再次,利用簇间差异和簇内距离这2个因素分别给出了不同属性在聚类过程中的动态加权方案;最后,联立距离公式和加权机制得到了可适用于异构属性数据的距离度量,进而设计了一种基于新的距离度量的异构属性数据子空间聚类算法。由于该算法既统一了异构属性数据的距离度量又能在子空间中进行簇搜索,因此该算法能在异构属性数据集上取得良好的聚类效果,在11个真实数据集上的对比实验结果验证了此算法的有效性。Real datasets often contain categorical and numerical attributes, and categorical attributes can be divided into ordinal and nominal attributes. Datasets with both categorical and numerical attributes can be called heterogeneous-attribute data. To solve the problem that the existing distance metrics of heterogeneous-attribute data can not distinguish ordinal attributes in the categorical attributes resulting in missing information and poor clustering effect, a new subspace clustering algorithm based on distance metric was proposed. Firstly, this study summarized the existing progress of distance metric of heterogeneous-attribute data and the solutions to distinguish ordinal attribute. Then the distance formulas were defined for the attribute values of ordinal, nominal, and numerical attributes from the perspective of their natural characteristics. Subsequently, a dynamic weighting scheme was proposed to weight different attributes according to their contributed inter-and intra-cluster distances during clustering. Finally, the distance formula and dynamic weighting scheme were combined to form the distance metric applicable to heterogeneous-attribute data, and a subspace clustering algorithm for heterogeneous-attribute data was thus proposed. Because the algorithm unified the distance metric of heterogeneous-attribute data and could search clusters in subspace, it could achieve good clustering effect on heterogeneous-attribute data. Experimental results on 11 real data sets showed the effectiveness of the algorithm.
关 键 词:异构属性数据 有序属性 距离度量 子空间聚类算法 动态权重
分 类 号:O235[理学—运筹学与控制论] TP311.13[理学—数学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28