检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:卢建云 邵俊明 LU Jianyun;SHAO Junming(School of Computer Science and Engineering(School of Cybersecurity),University of Electronic Science and Technology of China,Chengdu 611731;Artificial Intelligence and Big Data College,Chongqing Polytechnic University of Electronic Technology,Chongqing 401331,China)
机构地区:[1]电子科技大学计算机科学与工程学院(网络空间安全学院),四川成都611731 [2]重庆电子科技职业大学人工智能与大数据学院,重庆401331
出 处:《计算机工程与科学》2025年第2期327-335,共9页Computer Engineering & Science
基 金:国家自然科学基金(62376054);重庆市教委科学技术研究项目(KJQN202103109)。
摘 要:密度聚类是一种依据数据对象之间的密度关系进行聚类的算法。密度聚类通过判断数据集中低密度对象与密度中心对象的隶属关系实现对数据集的划分,能够有效地处理数据集中各种大小、不同形状和密度的簇。然而,受到数据集变密度、噪声和复杂分布的影响,如何准确估计数据对象的局部密度并通过密度中心确定聚类数目仍是需要研究的问题。针对上述密度聚类问题提出一种多层次密度中心图的聚类算法CMDCG。首先,基于每个数据对象的邻域,利用信息熵计算其局部密度;其次,依据局部密度和邻域空间确定每个数据对象的隶属关系并确定密度中心;最后,通过变化邻域空间得到多层次密度中心,根据多层次密度中心的隶属关系构建图结构,得到图的连通分量即为初始聚类,其他数据对象根据隶属关系划归到对应的初始聚类。在人工和真实数据集上的实验结果表明,CMDCG算法能够准确地识别聚类数目并形成正确的初始聚类,算法对变密度和噪声情况下的数据集有很好的鲁棒性。Density-based clustering is an algorithm that partitions a dataset based on the density relationships among data objects.By determining the membership relationships between low-density objects and density-center objects within the dataset,density-based clustering can effectively handle clusters of various sizes,shapes,and densities.However,due to the impact of variable densities,noise and complex distributions within datasets,how to accurately estimate the local density of data objects and determine the number of clusters through density centers remain challenges that require further research.To address these issues in density-based clustering,a clustering algorithm based on the multi-level density center graph(CMDCG)is proposed.Firstly,the local density of each data object is calculated using information entropy based on its neighborhood.Secondly,the membership relationships of each data object are statistically analyzed according to its local density and neighborhood space,and density centers are determined.Finally,multi-level density centers are obtained by varying the neighborhood space,and a graph structure is constructed based on the membership relationships among these multi-level density centers.The connected components of the graph are identified as initial clusters,and other data objects are assigned to these initial clusters based on their membership relationships.Experimental results on both synthetic and real dataset demonstrate that the CMDCG algorithm can accurately identify the number of clusters and form correct initial clusters,with clustering results that are robust to varying densities and noise.
关 键 词:密度聚类 多层次密度中心 连通图 信息熵 邻域空间
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.120