面向大数据的并行聚类算法  被引量:3

Parallel clustering algorithm for big data

在线阅读下载全文

作  者:刘解放[1] 张志辉[2] LIU Jie-fang;ZHANG Zhi-hui(School of Transportation and Information,Hubei Communications Technical College,Wuhan 430079,China;School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430081,China)

机构地区:[1]湖北交通职业技术学院交通信息学院,湖北武汉430079 [2]武汉科技大学计算机科学与技术学院,湖北武汉430081

出  处:《计算机工程与设计》2021年第8期2265-2270,共6页Computer Engineering and Design

基  金:江苏省自然科学基金项目(BK20181339);教育部科技发展中心高校产学研创新基金重点基金项目(2018A02005)。

摘  要:在大数据时代,针对CLUBS算法较高的计算复杂度导致训练效率低下的问题,提出一种面向大规模数据的并行聚类算法CLUBS‖,通过将CLUBS算法的思想融入MapReduce并行计算框架,实现数据的并行处理,提高算法的计算效率,从理论上对几个关键计算的并行化进行较为深入的分析,基于Ad-hoc消息传递对该算法进行实现。实验结果验证了所提方法的有效性。In the era of big data,aiming at the problem of low training efficiency of CLUBS algorithm because of the high computational complexity,a parallel clustering algorithm CLUBS‖for big data was developed,in which the idea of CLUBS was integrated into MapReduce parallel computing framework and the parallel processing of data was implemented,and the computational efficiency of the algorithm was greatly improved.Its theoretical analysis about parallelization of several key computations was also discussed,and the implementation of CLUBS‖based on Ad-hoc message passing was given.The effectiveness of the proposed method was demonstrated by experimental results.

关 键 词:大数据 聚类 并行计算 映射归约 对等式网络 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象