基于KD树和混沌蜉蝣优化的并行谱聚类算法  被引量:2

Parallel spectral clustering algorithm using KD tree and chaotic mayfly optimization algorithm

在线阅读下载全文

作  者:胡健[1,2] 刘祥敏 毛伊敏 陈志刚[3] HU Jian;LIU Xiangmin;MAO Yimin;CHEN Zhigang(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China;School of Information Engineering,Gannan University of Science and Technology,Ganzhou 341000,China;School of Computing,Central South University,Changsha 410000,China)

机构地区:[1]江西理工大学信息工程学院,江西赣州341000 [2]赣南科技学院信息工程学院,江西赣州341000 [3]中南大学计算机学院,湖南长沙410000

出  处:《计算机集成制造系统》2023年第12期4001-4020,共20页Computer Integrated Manufacturing Systems

基  金:国家自然科学基金资助项目(41562019);国家重点研发计划资助项目(2018YFC1504705);科技创新2030—“新一代人工智能”重大项目子课题(2020AAA0109605);江西省教育厅科技资助项目(GJJ151528,GJJ209405)。

摘  要:针对大数据环境下并行谱聚类算法存在的节点负载不均衡、冗余计算、矩阵相乘时间开销大以及初始簇中心敏感等问题,提出了基于KD(k-dimension)树和混沌蜉蝣优化算法的并行谱聚类算法(PSC-MO)。首先,提出基于采样的KD-tree数据分区策略(DPS)划分数据,保证了节点间负载均衡;其次,在构建稀疏相似矩阵过程中,提出优化的分区分配策略(OPA)和基于三角不等式的KD树剪枝策略以进行跨分区的t近邻搜索,避免了过多的冗余计算;然后,提出正规化定理,通过元素对应相乘的方式代替矩阵相乘以优化Laplacian矩阵正规化过程,有效地减少了时间开销;最后,提出混沌蜉蝣优化算法(CMO),得到最佳位置作为初始簇中心后进行k-means并行聚类,解决了算法对初始簇中心敏感的问题。实验表明,PSC-MO算法不但具有良好的聚类效果,而且在大规模数据集上表现出了良好的数据和系统可扩展性。In the big data environment,the parallel spectral clustering algorithm suffers from imbalanced node load,redundant computation,high overhead for matrix multiplication and initially cluster center sensitivity.To address these problems,a Parallel Spectral Clustering algorithm using KD-tree and chaotic Mayfly Optimization algorithm(PSC-MO)was proposed.The KD-tree data partitioning strategy based on sampling was proposed to divide the data and ensure the load balancing among nodes.During the construction of the sparse similarity matrix,an Optimized Partition Assignment strategy(OPA)and two KD-tree pruning strategies based on triangular inequalities were proposed to perform t nearest neighbor search across partitions,avoiding excessive redundant computations.Then,the regularization theorem was proposed to optimize the Laplacian matrix regularization processes by multiplying the elements correspondingly instead of matrix multiplication,which effectively reduced the time overhead.The Chaotic Mayfly Optimization Algorithm(CMO)was proposed to obtain the best position as the initial cluster center for k-means parallel clustering,which solved the problem that the parallel spectral clustering algorithm was sensitive to the initial cluster center.Experimental results showed that the PSC-MO algorithm not only had excellent clustering performance,but also exhibited both outstanding data scalability and system scalability with large-scale datasets.

关 键 词:大数据 并行化 MAPREDUCE模型 谱聚类 KD树 混沌蜉蝣优化算法 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象