基于密度的多度量空间数据聚类算法

Density-based Data Clustering Algorithm in Multi-metric Spaces

作　　者：朱轶凡罗程阳马瑞遥陈璐毛玉仁高云君[1] ZHU Yi-Fan;LUO Cheng-Yang;MA Rui-Yao;CHEN Lu;MAO Yu-Ren;GAO Yun-Jun(College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China;School of Software Technology,Zhejiang University,Ningbo 315048,China)

机构地区：[1]浙江大学计算机科学与技术学院,浙江杭州310027 [2]浙江大学软件学院,浙江宁波315048

出　　处：《软件学报》2025年第2期851-873,共23页Journal of Software

基　　金：国家重点研发计划(2021YFC3300303);国家自然科学基金(62025206,61972338,62102351);杭州市人工智能重大科技创新项目(2022AIZD0116)。

摘　　要：具有噪声的基于密度的数据聚类(DBSCAN)算法是数据挖掘领域中的经典方法之一,其不仅能发现数据中潜藏的复杂关系,还能过滤其中的数据噪声,从而获得高质量的数据聚类.然而,现有的基于密度的数据聚类算法仅支持单模态(类型)数据的聚类,难以应对多模态(类型)数据并存的应用场景.随着信息技术的快速发展,数据呈现多模态化的发展态势,现实生活中的数据不再是单一的数据类型,而是多种数据模态(类型)的组合,如文本、图像、地理坐标、数据特征等.因此,现有的数据聚类方法难以对复杂的多模态数据进行有效的数据建模,更无法进行高效的多模态数据聚类.基于此,提出一种基于密度的多度量空间聚类算法.首先,为了刻画多模态数据间的复杂关系,利用多度量空间表征数据之间的相似性关系,并且利用聚合多度量图索引(AMG)实现多模态数据建模.接着,利用差分化的相似性关系优化聚合多度量图的图结构,并且结合最优策略优先的搜索策略进行剪枝,以实现高效的多模态数据聚类.最后,在真实与合成数据集上针对多种参数设置进行实验.实验结果验证了所提方法运行效率提升了至少1个数量级,并具有较高的聚类精度与良好的可扩展性.The density-based spatial clustering of applications with noise(DBSCAN)algorithm is one of the clustering analysis methods in the field of data mining.It has a strong capability of discovering complex relationships between objects and is insensitive to noise data.However,existing DBSCAN methods only support the clustering of unimodal objects,struggling with applications involving multi-model data.With the rapid development of information technology,data has become increasingly diverse in real-life applications and contains a huge variety of models,such as text,images,geographical coordinates,and data features.Thus,existing clustering methods fail to effectively model complex multi-model data and cannot support efficient multi-model data clustering.To address these issues,in this study,a density-based clustering algorithm in multi-metric spaces is proposed.Firstly,to characterize the complex relationships within multimodel data,this study uses a multi-metric space to quantify the similarity between objects and employs aggregated multi-metric graph(AMG)to model multi-model data.Next,this study employs differential distances to balance the graph structure and leverages a best-first search strategy combined with pruning techniques to achieve efficient multi-model data clustering.The experimental evaluation on real and synthetic datasets,using various experimental settings,demonstrates that the proposed method achieves at least one order of magnitude improvement in efficiency with high clustering accuracy,and exhibits good scalability.

关键词：多度量空间多度量图基于密度的数据聚类数据挖掘多模态数据

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于密度的多度量空间数据聚类算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于密度的多度量空间数据聚类算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索