基于多层次密度中心图的聚类算法

A clustering algorithm based on the multi-level density center graph

作　　者：卢建云邵俊明 LU Jianyun;SHAO Junming(School of Computer Science and Engineering(School of Cybersecurity),University of Electronic Science and Technology of China,Chengdu 611731;Artificial Intelligence and Big Data College,Chongqing Polytechnic University of Electronic Technology,Chongqing 401331,China)

机构地区：[1]电子科技大学计算机科学与工程学院(网络空间安全学院),四川成都611731 [2]重庆电子科技职业大学人工智能与大数据学院,重庆401331

出　　处：《计算机工程与科学》2025年第2期327-335,共9页Computer Engineering & Science

基　　金：国家自然科学基金(62376054);重庆市教委科学技术研究项目(KJQN202103109)。

摘　　要：密度聚类是一种依据数据对象之间的密度关系进行聚类的算法。密度聚类通过判断数据集中低密度对象与密度中心对象的隶属关系实现对数据集的划分,能够有效地处理数据集中各种大小、不同形状和密度的簇。然而,受到数据集变密度、噪声和复杂分布的影响,如何准确估计数据对象的局部密度并通过密度中心确定聚类数目仍是需要研究的问题。针对上述密度聚类问题提出一种多层次密度中心图的聚类算法CMDCG。首先,基于每个数据对象的邻域,利用信息熵计算其局部密度;其次,依据局部密度和邻域空间确定每个数据对象的隶属关系并确定密度中心;最后,通过变化邻域空间得到多层次密度中心,根据多层次密度中心的隶属关系构建图结构,得到图的连通分量即为初始聚类,其他数据对象根据隶属关系划归到对应的初始聚类。在人工和真实数据集上的实验结果表明,CMDCG算法能够准确地识别聚类数目并形成正确的初始聚类,算法对变密度和噪声情况下的数据集有很好的鲁棒性。Density-based clustering is an algorithm that partitions a dataset based on the density relationships among data objects.By determining the membership relationships between low-density objects and density-center objects within the dataset,density-based clustering can effectively handle clusters of various sizes,shapes,and densities.However,due to the impact of variable densities,noise and complex distributions within datasets,how to accurately estimate the local density of data objects and determine the number of clusters through density centers remain challenges that require further research.To address these issues in density-based clustering,a clustering algorithm based on the multi-level density center graph(CMDCG)is proposed.Firstly,the local density of each data object is calculated using information entropy based on its neighborhood.Secondly,the membership relationships of each data object are statistically analyzed according to its local density and neighborhood space,and density centers are determined.Finally,multi-level density centers are obtained by varying the neighborhood space,and a graph structure is constructed based on the membership relationships among these multi-level density centers.The connected components of the graph are identified as initial clusters,and other data objects are assigned to these initial clusters based on their membership relationships.Experimental results on both synthetic and real dataset demonstrate that the CMDCG algorithm can accurately identify the number of clusters and form correct initial clusters,with clustering results that are robust to varying densities and noise.

关键词：密度聚类多层次密度中心连通图信息熵邻域空间

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多层次密度中心图的聚类算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多层次密度中心图的聚类算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索