数据降维与K-均值聚类的质量评估  被引量:3

Data dimensionality reduction and clustering quality evaluation of K-means clustering

在线阅读下载全文

作  者:何帆 何选森 刘润宗 樊跃平 熊茂华 HE Fan;HE Xuansen;LIU Runzong;FAN Yueping;XIONG Maohua(School of Management and Economics,Beijing Institute of Technology,Beijing 100081,China;School of Information Technology and Engineering,Guangzhou College of Commerce,Guangzhou 511363,China;College of Information Science and Engineering,Hunan University,Changsha 410082,China)

机构地区:[1]北京理工大学管理与经济学院,北京100081 [2]广州商学院信息技术与工程学院,广州511363 [3]湖南大学信息科学与工程学院,长沙410082

出  处:《重庆理工大学学报(自然科学)》2024年第1期131-141,共11页Journal of Chongqing University of Technology:Natural Science

基  金:广东省普通高校重点领域专项(2021ZDZX1035);广东省教育厅特色创新项目(2022KTSCX64)。

摘  要:聚类分析在大数据时代应用广泛,但缺乏直观评价聚类质量的有效方法。为此,提出一种具有数据降维和搜寻数据固有聚类数量的处理模式。在数据散射矩阵基础上构造一个增广矩阵,利用线性辨别分析将高维数据变换到最具辨别性的低维特征子空间以实现数据降维。为解决分区聚类算法的随机初始化问题,提出最小-最大规则,避免出现空聚类并确保数据的可分性。对于聚类的结果,计算每个聚类的轮廓系数,通过比较轮廓的尺寸以评价不同聚类数量情况下的聚类质量。对K-均值算法的仿真结果说明,这种处理模式不仅能够可视化确定未知数据所固有的聚类数量,而且能为高维数据提供有效的分析方法。In the age of big data,data analysis is becoming more and more important,and one of the most important tasks in data analysis is data classification.In pattern recognition and machine learning,classification can also be divided into supervised and unsupervised classification.In supervised classification,the data includes both features and class labels.However,in practical applications,data sources are usually obtained through sensor device,and there are no available class labels for the data.As a result,unsupervised classification,especially clustering techniques,plays a crucial role in data analysis.Clustering,as an exploratory data analysis method,can discover the inherent structure of raw data by grouping data samples into different clusters.In the era of mobile internet,the dimensionality and structure of data are becoming more complex,cluster analysis of high-dimensional data is inevitable.For the huge amount of data that needs to be processed,to more easily organize,summarize and extract the useful information contained in the data,compression has also become a very important topic.Data compression(dimensionality reduction)is to transform the data into a new feature subspace with a lower dimensionality.Dimensionality reduction mainly includes feature selection and feature extraction.Feature selection is to select a subset of the features.In feature extraction,the relevant information is derived from the feature set in order to construct a new feature subspace.Obviously,dimensionality reduction is not only a basic step of reprocessing,but also conducive to data visualization.Based on the properties of generated clusters,clustering can be divided into partitional clustering and hierarchical clustering.In academic and industrial fields,however,partitional clustering is the most widely used.In various partitional clustering methods,the K-means clustering algorithm has become the most classic and popular algorithm.This is because its low computational complexity makes it popular.The K-means algorithm has achieve

关 键 词:聚类质量 散射矩阵 线性辨别分析 最小-最大规则 轮廓分析 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象