基于MapReduce与优化布谷鸟算法的并行密度聚类算法  

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

在线阅读下载全文

作  者:毛伊敏[1] 顾森晴 MAO Yi-min;GU Sen-qing(College of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)

机构地区:[1]江西理工大学信息工程学院,江西赣州341000

出  处:《吉林大学学报(工学版)》2023年第10期2909-2916,共8页Journal of Jilin University:Engineering and Technology Edition

基  金:国家重点研发计划项目(2018YFC1504705);国家自然科学基金项目(41562019)。

摘  要:针对并行化密度聚类的过程中,不同密度聚类簇边界点划分模糊,并且存在数据噪声,从而影响聚类性能,使聚类结果受制于局部最优影响的问题,提出一种基于MapReduce与优化布谷鸟算法的并行密度聚类算法。首先,该算法结合K-means中的近邻与逆近邻思路的策略KDBSCAN(K-means DBSCAN),通过计算各数据点的影响空间,以此重新定义基于密度的聚类(Density-based spatial dutering of apptications with noise,DBSCAN)算法中聚类簇的拓展条件,避免了不同密度聚类簇边界点划分模糊的问题;其次,结合KDBSCAN密度聚类中的近邻思想提出了一种可行的迭代性噪声点处理策略,减轻数据中噪声点对于聚类算法性能的影响;再次,提出基于传统布谷鸟算法的优化改进策略MCS(Majorization cuckoo search),通过衰减发现巢穴概率的权重,随着迭代搜寻次数的增加提升算法收敛速度,解决了聚类结果受制于局部最优的问题;最后,结合MapReduce提出了并行密度聚类策略MCS-KDBSCAN,通过并行化密度聚类算法运算,减轻了并行聚类算法局部最优解传递的通信负担,提升了算法性能。实验证明,提出的MCS-KDBSCAN并行化密度聚类算法在聚类精度、聚类运行时间等方面均较优。In the process of parallel density clustering,the boundary points of clusters with different densities are divided fuzzy and there is data noise,which affects the clustering performance and makes the clustering results subject to the influence of local optimization.Therefore,a parallel density clustering algorithm MCS-KDBSCAN(maprule based parallel maximization cuckoo search K-means DBSCAN)based on MapReduce and optimized cuckoo algorithm is proposed.Firstly,the algorithm combines the strategy KDBSCAN(K-means DBSCAN),which is based on the idea of nearest neighbor and inverse nearest neighbor in k-means.By calculating the influence space of each data point,the expansion conditions of clustering clusters in DBSCAN algorithm are redefined to avoid the problem of fuzzy boundary points of clustering clusters with different densities;Then,combined with the nearest neighbor idea in KDBSCAN density clustering,a feasible iterative noise point processing strategy is proposed to reduce the impact of noise points in data on the performance of clustering algorithm;Secondly,the optimization and improvement strategy MCS(maximization cuckoo search)based on the traditional cuckoo algorithm is proposed.By attenuating the weight of the probability of finding nests,with the increase of the number of iterative searches,the convergence speed of the algorithm is improved,and the influence of local optimization on the clustering results is solved;Finally,combined with MapReduce,a parallel density clustering strategy MCS-KDBSCAN is proposed.By parallelizing the operation of density clustering algorithm,the communication burden of local optimal solution transmission of parallel clustering algorithm is reduced and the performance of the algorithm is improved.Experiments show that the proposed mcskdbscan parallel density clustering algorithm is superior in clustering accuracy and clustering running time.

关 键 词:密度聚类 优化布谷鸟算法 基于密度的聚类算法 MAPREDUCE 抗噪能力 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象