基于MapReduce的内存并行Join算法研究  被引量:1

RESEARCH ON MAPREDUCE-BASED IN-MEMORY PARALLEL JOIN ALGORITHM

在线阅读下载全文

作  者:李成[1,2] 许胤龙[1,2] 郭帆[1,2] 吴思[1,2] 

机构地区:[1]中国科学技术大学计算机科学与技术学院,安徽合肥230027 [2]安徽省高性能计算重点实验室,安徽合肥230027

出  处:《计算机应用与软件》2016年第7期257-260,277,共5页Computer Applications and Software

摘  要:传统的并行Join算法缺少必要的容错能力,且数据划分不均往往导致单个线程的阻塞成为整个任务执行的瓶颈。针对以上问题,分析内存连接的各个阶段对Join算法性能的影响,提出一种可利用MapReduce的动态机制,避免了传统并行连接算法的数据任务分派不均和容错问题。算法使用MapReduce编程框架,并通过封装分块标记减少MapReduce Join执行过程中标记和排序的计算开销,使算法性能显著提高。实验结果表明,该算法在共享内存体系结构下,性能上相比已有算法有显著改进。Traditional parallel Join algorithms lack the necessary fault tolerance capability,and data partitioning inequality often leads to a single thread obstruction which in turn becomes the bottleneck of the whole task execution. In light of the above problem,this paper dissects the influence of each phase of in-memory join on the performance of Join algorithm,and proposes a dynamic mechanism in which the MapReduce is applicable,thus avoids the problems of traditional parallel Join algorithm implementation in unequal data tasks allocation and fault tolerance. The algorithm uses MapReduce programming framework,and reduces the computational cost of tagging and ranking in execution process of MapReduce Join through encapsulating the blocking tags,this makes the performance of the algorithm improve remarkably. Experimental results show that this algorithm has evident improvement in performance for shared-memory architecture.

关 键 词:内存连接 数据封装 MAPREDUCE 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象