基于密度分布的鲁棒谱聚类算法  

Robust Spectral Clustering Based on Density Distribution

在线阅读下载全文

作  者:李超 廖红梅[1,2] 徐晓 郭丽丽 丁世飞[1,2] LI Chao;LIAO Hong-Mei;XU Xiao;GUO Li-Li;DING Shi-Fei(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116;Mine Digitization Engineering Research Center,Ministry of Education(China University of Mining and Technology),Xuzhou,Jiangsu 221116)

机构地区:[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]矿山数字化教育部工程研究中心(中国矿业大学),江苏徐州221116

出  处:《计算机学报》2024年第11期2645-2663,共19页Chinese Journal of Computers

基  金:国家自然科学基金项目(62276265,61976216)资助。

摘  要:谱聚类作为一种基于图论的聚类方法,通过相似性矩阵对数据进行特征分解或将数据投影到低维空间以实现更好的数据划分.谱聚类因其适用于复杂数据和非凸子簇而受到广泛的关注,并已成功应用在很多领域.然而,计算复杂度高、噪声敏感等问题会限制其聚类效果的进一步提升.针对这些问题,本文提出了一种基于密度分布的鲁棒谱聚类算法.首先,设置噪声系数以过滤少量的低密度噪声点.其次,根据密度峰值聚类具有的特性,即尽可能多地划分数据能够保证子簇内数据标签的一致性,新提出的算法能够在较少的子簇数和更高的簇内标签一致性上达到平衡,实现了对数据更加优质的划分.最后,基于簇间密度分布的相似性度量改善了谱聚类在密度不均匀数据集上的聚类效果.合成数据以及真实数据上的实验充分证明了新算法在9个最新改进算法中的有效性.在保证聚类效率的前提下,新算法在真实数据上的准确率、调整兰德系数和调整互信息的平均值上至少分别提升了10.02%、22.11%和15.76%.Spectral clustering,as a classic clustering method based on graph theory,uses the similarity matrix to decompose the data or project the data into a low-dimensional space to achieve better data partition.In spectral clustering,the similarity matrix of data needs to be constructed first,and the similarity between data points is usually calculated by the Gaussian kernel function or k-nearest neighbors method.Then,the similarity matrix is transformed into a Laplacian matrix,and the eigendecomposition of the Laplacian matrix is carried out,and the eigenvectors are obtained and clustered by the k-means algorithm method.Finally,according to the clustering results,the data points belong to the cluster.Spectral clustering is of great significance in the field of data mining and pattern recognition.It is not only suitable for clustering problems,but also can be applied to graph segmentation,dimensionality reduction,feature selection and other fields,so it has a wide range of application values.However,the computational complexity of spectral clustering is high and may be limited when dealing with large-scale data sets.In addition,spectral clustering is sensitive to noise,because noisy data points may affect the construction of the similarity matrix and the calculation of the eigenvectors,resulting in instability and a decrease in the accuracy of the clustering results.Especially in the case of no noise preprocessing or denoising,spectral clustering may incorrectly divide noisy data points into a certain cluster,affecting the final clustering results.Therefore,when dealing with data containing noise,it is necessary to properly clean or denoise the data before using spectral clustering to improve the effect.To address these problems,this paper proposes a robust spectral clustering algorithm based on density distribution.Firstly,the noise points between subclusters have lower local density;therefore,this paper sets the noise coefficient to filter a small number of low-density noise points from the perspective of different de

关 键 词:谱聚类 密度分布 子簇相似性 局部峰值 噪声检测 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象