基于GraphLab的分布式近邻传播聚类算法  

Distributed affinity propagation clustering algorithm based on GraphLab

在线阅读下载全文

作  者:陈文强[1] 林琛[1,2] 陈珂[3] 陈锦秀[1] 邹权[1,2] 

机构地区:[1]厦门大学信息科学与技术学院 [2]厦门大学深圳研究院,广东深圳518057 [3]广东石油化工学院计算机科学与技术系,广东茂名525000

出  处:《山东大学学报(工学版)》2013年第5期13-18,23,共7页Journal of Shandong University(Engineering Science)

基  金:国家自然科学基金资助项目(61102136;61001013);福建省自然科学基金资助项目(2011J05158;2010J01351);深圳市科技创新基础研究资助项目(JCYJ20120618155655087)

摘  要:为有效实现海量数据的非线性聚类,提出基于GraphLab的分布式流式近邻传播算法——GStrAP(GraphLab based stream affinity propagation)。该算法将数据抽象为有向无环图模型,采用"Gather-Apply-Scatter"的模式完成数据同步和算法迭代。在人工合成流形数据3D Clusters、Aggregation、Flame和Pathbased数据集上分别采用不同数据规模以及与传统K-means的聚类性能做对比,实验表明:基于GraphLab的近邻传播算法对数据规模具有良好的拓展性,在保持算法聚类效果的同时,有效降低时间复杂度。A distributed affinity propagation algorithm based on GraphLab was proposed, which was named GStrAP (Graphlab based stream affinity propagation). In GraphLab's DAG abstraction, the parallel computation was represen ted as a directed acyclic graph with data flowing along edges between vertices, and the "Gather-Apply-Scatter" para digm was applied to complete data synchronization and algorithm's iteration. The experimental results on 3D Clusters, Aggregation, Flame and Pathbased datasets with different scale and the clustering performance were compared with K means, which demonstrated that the proposed GStrAP could achieve high performance on both scalability and accuracy.

关 键 词:近邻传播聚类算法 分布式计算 GraphLab 聚类融合 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象