检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]西安电子科技大学计算机学院,西安710071
出 处:《西安交通大学学报》2014年第11期97-102,共6页Journal of Xi'an Jiaotong University
基 金:国家科技支撑计划资助项目(2012BAH01B05);陕西省科技统筹创新工程计划资助项目(2012KTZD-02-05-2)
摘 要:针对传统的并行哈希划分算法不能高效地利用多核处理器的并行资源,且不能较好处理有倾斜的输入数据的问题,提出了一种在多核处理器中基于MapReduce的哈希划分算法,并且提出了存储结构优化、多步划分优化、数据倾斜优化3种优化策略。该算法将输入数据分成若干块后提交给各个线程并行处理,并选择合适的策略避免写冲突,使其能够高效地利用多核处理器的并行资源。文中提出的哈希表能够提高cache效率,从而提升算法的整体性能。引入MapReduce模型可使多步哈希划分在Map过程和Reduce过程中分别进行;数据倾斜优化策略能使算法适应有倾斜的输入数据,且具有较好的效果。实验结果表明:在多核处理器中,文中提出的算法能够适应各种分布的输入数据,并且使哈希划分的整体性能得到提升。A hash partitioning method based on MapReduce framework and three efficient optimizations including storage structure optimization, multi-pass partitioning optimization and skew data optimization on chip multiproeessor (CMP) are proposed to address the problems that conventional hash partitioning method cannot take full advantage of CMP's parallel execution resources and properly process the skew input data. The input data are split into several units which are later processed by all threads simultaneously, and suitable strategy is adopted to avoid writing collision hence CMP's parallel execution resources could be fully unitized. The new hash table proposed in this paper can improve the overall performance by increasing the cache efficiency. The introduction of MapReduce framework makes it possible to multiply partition the data in Map phase and Reduce phase, respectively. In addition, the skew data optimization can make the proposed method suitable for processing various skew input data. Experiments have testified these advantages displayed by the proposed hash partitioning method.
关 键 词:数据划分 哈希处理 多核处理器 MAPREDUCE模型
分 类 号:TP392[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222