多核处理器中基于MapReduce的哈希划分优化  被引量:1

Hash Partitioning Optimizations Based on MapReduce for Chip Multiprocessors

在线阅读下载全文

作  者:袁通[1] 刘志镜[1] 刘慧[1] 王梓[1] 

机构地区:[1]西安电子科技大学计算机学院,西安710071

出  处:《西安交通大学学报》2014年第11期97-102,共6页Journal of Xi'an Jiaotong University

基  金:国家科技支撑计划资助项目(2012BAH01B05);陕西省科技统筹创新工程计划资助项目(2012KTZD-02-05-2)

摘  要:针对传统的并行哈希划分算法不能高效地利用多核处理器的并行资源,且不能较好处理有倾斜的输入数据的问题,提出了一种在多核处理器中基于MapReduce的哈希划分算法,并且提出了存储结构优化、多步划分优化、数据倾斜优化3种优化策略。该算法将输入数据分成若干块后提交给各个线程并行处理,并选择合适的策略避免写冲突,使其能够高效地利用多核处理器的并行资源。文中提出的哈希表能够提高cache效率,从而提升算法的整体性能。引入MapReduce模型可使多步哈希划分在Map过程和Reduce过程中分别进行;数据倾斜优化策略能使算法适应有倾斜的输入数据,且具有较好的效果。实验结果表明:在多核处理器中,文中提出的算法能够适应各种分布的输入数据,并且使哈希划分的整体性能得到提升。A hash partitioning method based on MapReduce framework and three efficient optimizations including storage structure optimization, multi-pass partitioning optimization and skew data optimization on chip multiproeessor (CMP) are proposed to address the problems that conventional hash partitioning method cannot take full advantage of CMP's parallel execution resources and properly process the skew input data. The input data are split into several units which are later processed by all threads simultaneously, and suitable strategy is adopted to avoid writing collision hence CMP's parallel execution resources could be fully unitized. The new hash table proposed in this paper can improve the overall performance by increasing the cache efficiency. The introduction of MapReduce framework makes it possible to multiply partition the data in Map phase and Reduce phase, respectively. In addition, the skew data optimization can make the proposed method suitable for processing various skew input data. Experiments have testified these advantages displayed by the proposed hash partitioning method.

关 键 词:数据划分 哈希处理 多核处理器 MAPREDUCE模型 

分 类 号:TP392[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象