结合共享近邻和流形距离的自适应谱聚类算法  被引量:1

Adaptive Spectral Clustering Algorithm Combining Shared Nearest Neighbors and Manifold Distance

在线阅读下载全文

作  者:张喜梅[1,3] 解滨[1,2,3] 米据生[4] 徐童童 张祎玲 ZHANG Ximei;XIE Bin;MI Jusheng;XU Tongtong;ZHANG Yiling(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics&Data Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Key Laboratory of Network&Information Security,Hebei Normal University,Shijiazhuang 050024,China;School of Mathematical Sciences,Hebei Normal University,Shijiazhuang 050024,China)

机构地区:[1]河北师范大学计算机与网络空间安全学院,石家庄050024 [2]河北师范大学供应链大数据分析与数据安全河北省工程研究中心,石家庄050024 [3]河北师范大学河北省网络与信息安全重点实验室,石家庄050024 [4]河北师范大学数学科学学院,石家庄050024

出  处:《计算机科学》2023年第10期59-70,共12页Computer Science

基  金:国家自然科学基金(62076088);北京市自然科学基金(Z210002)。

摘  要:谱聚类算法是建立在图论的基础上,将聚类问题转化为图的划分问题,能识别任意形状的类簇且易于实现,因此比传统聚类算法具有更强的适应性。然而,该算法中常用的距离度量不能同时考虑全局和局部一致性,且易受到噪声影响;聚类结果依赖由输入数据构造的相似度矩阵,且通过特征分解得到松弛划分矩阵和离散化过程的两步独立策略难以得到一个共同最优解。因此,提出一种结合共享近邻和流形距离的自适应谱聚类算法(SNN-MSC),引入一种新的具有指数项和比例因子的流形距离,可以灵活调整同一流形内数据的相似度和不同流形之间数据的相似度之比,并将密度因子纳入流形距离度量中,以消除噪声影响;采用共享近邻重新定义相似度度量,能挖掘数据点之间的空间结构和局部关系;同时,对拉普拉斯矩阵施加秩约束,使相似度矩阵中的连通分量完全等于簇个数,能够在优化求解过程中自适应优化数据相似度矩阵和聚类结构,无须再进行离散化操作。在人工数据集和UCI真实数据集上的对比实验显示,所提算法在多个聚类有效性指标上能体现出更好的性能。Spectral clustering algorithm is built on the basis of graph theory.The clustering problem is transformed into the graph division problem,which can identify any shape of the cluster and easy to implement,so it has stronger adaptability than the traditional clustering algorithm.However,the distance measurement commonly used in this algorithm cannot consider both global and local consistency,and is easily affected by noise.The clustering results depend on the similarity matrix constructed from the input data,and the relaxation partition matrix obtained by feature decomposition and the two-step independent strategy of the dissociation process are difficult to obtain a common optimal solution.Therefore,an adaptive spectral clustering algorithm(SNN-MSC)combining shared nearest neighbors and manifold distance is proposed.A new manifold distance with exponential terms and sca-ling factors is introduced.It can flexibly adjust the similarity of data in the same manifold and the similarity ratio of data between different manifolds,and incorporate the density factor into the distance measurement of manifolds to eliminate the noise effect.The shared nearest neighbor is used to redefine the similarity measure,and the spatial structure and local relation between data points can be mined.At the same time,the rank constraint is applied to the Laplacian matrix so that the connected component in the similarity matrix is equal to the number of clusters.This method can adaptively optimize the data similarity matrix and clustering structure in the optimization process without any discretization operation.Through comparison experiments on artificial data sets and real data sets of UCI,the proposed algorithm shows better performance on multiple clustering validity indexes.

关 键 词:谱聚类 流形距离 共享近邻 秩约束 自适应 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象