检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张玉琴[1] 梁莉[2] 张建亮[1] 冯向东[1] Zhang Yuqin;Liang Li;Zhang Jianliang;Feng Xiangdong(College of the Engineering&Technical,Chengdu University of Technology,Leshan 614000,China;School of Mathematics and Physics,Chengdu University of Technology,Chengdu 610059,China)
机构地区:[1]成都理工大学工程技术学院,乐山614000 [2]成都理工大学数理学院,成都610059
出 处:《国外电子测量技术》2022年第9期40-46,共7页Foreign Electronic Measurement Technology
基 金:四川省自然科学重点项目(18ZA0075,18ZA0073);乐山市科技局重点研究项目(21GZD015);成都理工大学工程技术学院基金(C122019027)项目资助。
摘 要:为改善大规模数据集的处理性能,提出了基于改进K-means++和基于密度的含噪声应用空间聚类(DBSCAN)算法的大数据聚类方法。首先,将K-means++与局部搜索策略相结合,在数据集上进行初始化分区,然后利用DBSCAN算法在每个分组内单独执行数据聚类。利用改进K-means++算法提高数据预处理质量,并通过分区并行聚类的操作显著降低DBSCAN的计算负担,加快处理速度。最后,通过两阶段的剪枝策略对边缘聚类进行高效合并。实验结果表明,所提方法大幅降低了DBSCAN的执行时间,且聚类数据的质量与原DBSCAN算法非常接近,在UCI库的Bitcoin数据集上比其他比较方法的聚类效率提高了10倍以上,在处理时间和聚类数据质量之间实现了最优平衡。In order to improve the processing performance of large-scale data sets, a big data clustering method based on improved K-means++ and DBSCAN algorithms is proposed. First, K-means++ is combined with a local search strategy to perform initialized partitioning on the data set, and then the DBSCAN algorithm is used to perform data clustering within each data partitions separately. The improved K-means++ algorithm is used to improve the quality of data pre-processing, and the computational burden of DBSCAN is significantly reduced through the operation of data partitioning and parallel clustering, thereby speeding up the overall processing speed. Finally, a two-step pruning strategy is proposed to merge the border clusters efficiently. The experimental results show that the proposed method greatly reduces the execution time of DBSCAN, and the quality of the clustered data is very close to the original DBSCAN algorithm. The clustering efficiency on the Bitcoin data set from the UCI library is more than 10 times higher than that of other comparison methods, and an optimal balance is achieved between processing time and clustering data quality.
关 键 词:大数据 数据聚类 DBSCAN K-means++ 局部搜索
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222