基于改进K-means++和DBSCAN的大数据聚类方法被引量：8

Big data clustering method based on improved K-means++ and DBSCAN

作　　者：张玉琴[1] 梁莉[2] 张建亮[1] 冯向东[1] Zhang Yuqin;Liang Li;Zhang Jianliang;Feng Xiangdong(College of the Engineering&Technical,Chengdu University of Technology,Leshan 614000,China;School of Mathematics and Physics,Chengdu University of Technology,Chengdu 610059,China)

机构地区：[1]成都理工大学工程技术学院,乐山614000 [2]成都理工大学数理学院,成都610059

出　　处：《国外电子测量技术》2022年第9期40-46,共7页Foreign Electronic Measurement Technology

基　　金：四川省自然科学重点项目(18ZA0075,18ZA0073);乐山市科技局重点研究项目(21GZD015);成都理工大学工程技术学院基金(C122019027)项目资助。

摘　　要：为改善大规模数据集的处理性能,提出了基于改进K-means++和基于密度的含噪声应用空间聚类(DBSCAN)算法的大数据聚类方法。首先,将K-means++与局部搜索策略相结合,在数据集上进行初始化分区,然后利用DBSCAN算法在每个分组内单独执行数据聚类。利用改进K-means++算法提高数据预处理质量,并通过分区并行聚类的操作显著降低DBSCAN的计算负担,加快处理速度。最后,通过两阶段的剪枝策略对边缘聚类进行高效合并。实验结果表明,所提方法大幅降低了DBSCAN的执行时间,且聚类数据的质量与原DBSCAN算法非常接近,在UCI库的Bitcoin数据集上比其他比较方法的聚类效率提高了10倍以上,在处理时间和聚类数据质量之间实现了最优平衡。In order to improve the processing performance of large-scale data sets, a big data clustering method based on improved K-means++ and DBSCAN algorithms is proposed. First, K-means++ is combined with a local search strategy to perform initialized partitioning on the data set, and then the DBSCAN algorithm is used to perform data clustering within each data partitions separately. The improved K-means++ algorithm is used to improve the quality of data pre-processing, and the computational burden of DBSCAN is significantly reduced through the operation of data partitioning and parallel clustering, thereby speeding up the overall processing speed. Finally, a two-step pruning strategy is proposed to merge the border clusters efficiently. The experimental results show that the proposed method greatly reduces the execution time of DBSCAN, and the quality of the clustered data is very close to the original DBSCAN algorithm. The clustering efficiency on the Bitcoin data set from the UCI library is more than 10 times higher than that of other comparison methods, and an optimal balance is achieved between processing time and clustering data quality.

关键词：大数据数据聚类 DBSCAN K-means++ 局部搜索

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进K-means++和DBSCAN的大数据聚类方法被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进K-means++和DBSCAN的大数据聚类方法 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于改进K-means++和DBSCAN的大数据聚类方法被引量：8