基于高斯分布的自适应密度峰值聚类算法  

Adaptive Density Peak Clustering Algorithm Based on Gaussian Distribution

在线阅读下载全文

作  者:李启文 王治和 杜辉 鲁德鹏 LI Qiwen;WANG Zhihe;DU Hui;LU Depeng(School of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,Gansu,China)

机构地区:[1]西北师范大学计算机科学与工程学院,甘肃兰州730070

出  处:《计算机工程》2025年第4期137-148,共12页Computer Engineering

基  金:国家自然科学基金(62372353)。

摘  要:密度峰值聚类(DPC)算法可以发现任意形状的簇,对噪声具有鲁棒性,因此被广泛应用于各个领域。但DPC算法需要人工选取聚类中心,对于密度不均匀型数据集表现较差。为此,提出一种基于高斯分布的自适应密度峰值聚类算法。首先,计算局部密度和相对距离的乘积θ_(i),通过Z-score标准化方法,将θ_(i)映射到符合高斯分布的二维空间中,利用高斯分布的标准偏差来自适应选取聚类中心,得到聚类中心集合;其次,将其余数据点分配到离其最近的聚类中心所在的簇中,得到初步划分结果;最后,设计缝合因子模型,计算簇间缝合系数,当缝合系数大于阈值时合并初步划分结果中最相似簇并更新相似度矩阵,直至完成合并得到最终结果。在人工数据集和真实数据集上的实验结果表明,与DBSCAN算法、DPC算法和ICKDC算法对比,所提算法的聚类准确度更高,聚类性能更佳。The Density Peak Clustering(DPC)algorithm excels in diverse fields,is adept at identifying clusters of any shape,and is noise-resistant.However,the algorithm needs help with manual cluster center selection and underperforms on datasets with uneven densities.This paper introduces a novel Gaussian distribution-based adaptive DPC algorithm to overcome these challenges.This approach involves multiplying the local density by the relative distance θ_(i) and mapping this θ_(i)into a two-dimensional Gaussian space using Z-score standardization.Uniquely,the algorithm adaptively selects cluster centers based on the standard deviation of the Gaussian distribution and assigns data points to their nearest centers for initial clustering.This paper also introduces a suture factor model to facilitate the merging of similar sub-clusters.When the suture coefficient is greater than the threshold,merge the most similar clusters in the preliminary partition results and update the similarity matrix until the merging process is completed to obtain the final result.The experimental results on artificial and real datasets indicate that compared with DBSCAN algorithm,DPC algorithm,and ICKDC algorithm,the proposed algorithm has higher clustering accuracy and better clustering performance.

关 键 词:密度峰值聚类算法 高斯分布 Z-score标准化 缝合因子 簇间相似度 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象