检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周爽[1] 鲍玉斌[1] 王志刚[1] 冷芳玲[1] 于戈[1] 邓超[2] 郭磊涛[2]
机构地区:[1]东北大学信息科学与工程学院,沈阳110819 [2]中国移动通信研究院业务支撑研究所,北京100053
出 处:《计算机科学与探索》2014年第1期40-50,共11页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金Nos.61033007;61173028;中央高校基本科研业务费专项资金No.N100704001;教育部-中国移动科研基金No.MCM20125021~~
摘 要:图数据划分是基于BSP(bulk synchronous parallel)编程模型的大规模图处理系统中一个关键技术问题。传统的图划分技术需要多次迭代,时间复杂度过高,且划分结果不具有图顶点到分区的映射信息,因此这些算法并不适用于BSP模型下的数据划分。提出了一种新的面向BSP模型的负载均衡Hash数据划分算法(balanced Hash partition,BHP)。为了实现各个分区的出边数尽可能均衡,该算法引入了虚拟桶的概念,通过贪婪算法将虚拟桶重组为实际分区,保证了每个实际分区负载均衡,同时数据本地化策略使本分片上的数据尽可能地保留在本节点上,从而减小在数据加载时的数据迁移开销。从三个方面对比了BHP算法和经典Hash算法的性能,结果表明BHP算法能够提高作业的执行效率,减少消息发送的数量,有效解决了经典Hash算法的负载不均衡和分区间交互边过多的问题,当数据量变大时,效果尤为明显。Graph data partition is a critical technical problem in the large-scale graph processing system based on bulk synchronous parallel (BSP) model. Traditional graph partition technology requires many iterations, the time com- plexity is too high, and the partition results don' t have the mapping information from vertexes to partitions. So those algorithms are not suitable for partitioning graph data for the BSP model based systems. This paper presents a new Hash partition algorithm that implements load balance, called balanced Hash partition (BHP). In order to achieve the balance of the number of out degrees in each partition as much as possible, the concept of virtual bucket is introduced. Then these virtual buckets are reorganized into desired partitions by using a greedy algorithm. In this way, the algorithm can ensure the load balance of each partition. At the same time, the data localization strategy makes the data on the split locate on the corresponding node as much as possible. So, the overhead of data migration during the data loading can be reduced. This paper does some experiments to compare the performance between BHP and the classic Hash algorithm from multiple aspects. The results show that the BHP algorithm can improve the job executing efficiency,reduce the number of sending messages, and effectively solve the problems of load imbalance and too many interac- tive edges among partitions.
关 键 词:BSP模型 图划分 分布式系统 负载均衡 虚拟桶
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7