检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈文强[1] 林琛[1,2] 陈珂[3] 陈锦秀[1] 邹权[1,2]
机构地区:[1]厦门大学信息科学与技术学院 [2]厦门大学深圳研究院,广东深圳518057 [3]广东石油化工学院计算机科学与技术系,广东茂名525000
出 处:《山东大学学报(工学版)》2013年第5期13-18,23,共7页Journal of Shandong University(Engineering Science)
基 金:国家自然科学基金资助项目(61102136;61001013);福建省自然科学基金资助项目(2011J05158;2010J01351);深圳市科技创新基础研究资助项目(JCYJ20120618155655087)
摘 要:为有效实现海量数据的非线性聚类,提出基于GraphLab的分布式流式近邻传播算法——GStrAP(GraphLab based stream affinity propagation)。该算法将数据抽象为有向无环图模型,采用"Gather-Apply-Scatter"的模式完成数据同步和算法迭代。在人工合成流形数据3D Clusters、Aggregation、Flame和Pathbased数据集上分别采用不同数据规模以及与传统K-means的聚类性能做对比,实验表明:基于GraphLab的近邻传播算法对数据规模具有良好的拓展性,在保持算法聚类效果的同时,有效降低时间复杂度。A distributed affinity propagation algorithm based on GraphLab was proposed, which was named GStrAP (Graphlab based stream affinity propagation). In GraphLab's DAG abstraction, the parallel computation was represen ted as a directed acyclic graph with data flowing along edges between vertices, and the "Gather-Apply-Scatter" para digm was applied to complete data synchronization and algorithm's iteration. The experimental results on 3D Clusters, Aggregation, Flame and Pathbased datasets with different scale and the clustering performance were compared with K means, which demonstrated that the proposed GStrAP could achieve high performance on both scalability and accuracy.
关 键 词:近邻传播聚类算法 分布式计算 GraphLab 聚类融合
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.170