基于信息网模型的分布并行多连接查询优化  

DISTRIBUTED PARALLEL MULTI-JOIN QUERY OPTIMIZATION IN INFORMATION NETWORK MODEL

在线阅读下载全文

作  者:徐晶[1] 刘梦赤[1,2] 

机构地区:[1]武汉大学软件工程国家重点实验室,湖北武汉430072 [2]武汉大学计算机学院,湖北武汉430072

出  处:《计算机应用与软件》2017年第7期66-73,84,共9页Computer Applications and Software

基  金:国家自然科学基金项目(61672389;61202100);软件工程国家重点实验室开放基金项目(SKLSE2012-09-20)

摘  要:在分布式集群系统中,数据根据划分算法存储在集群的各个节点,这为涉及大量连接操作的复杂查询带来了昂贵的网络开销。针对该问题,基于信息网模型INM(Information Network Mode),提出最小通信量查询划分算法和多目标查询优化算法。其中查询划分算法将复杂查询划分成多个PWOC(parallelizable without communication)子查询,所有子查询可近似无通信地并行执行。多目标优化算法将子查询作为查询计划的基本操作,并将并行性和通信代价同时作为驱动目标,以传统多目标加权算法结合贪心策略作为评估依据生成查询计划树。最后,系统基于TPC-H基准生成测试数据,将原始算法与优化算法进行了对比实验,结果表明优化算法可以极大提高复杂查询的效率。In the distributed cluster system, data is partitioned in different nodes according to data partition algorithm, which causes expensive network communication expense for the complex multi-join query. To solve the problem, the Minimum Traffic Query Split Algorithm (MTQS) and the Multi-Objective Query Optimization Algorithm (MOQO) based on the Information Network Model are proposed. Among these two algorithms, MTQS is aimed at splitting query into several parallelizable without communication (PWOC) sub-queries, which guarantees every sub- query parallels approximately without communication. MOQO takes sub-query as the basic operation, which puts the parallelism and communication cost as goal driven and builds the query plan tree combining the traditional Muhi- Objective weighted algorithm with the greedy algorithm as the assessing accordance. In the end, the system generates test data by TPC-H benchmark and conducts a comparative experiment between the previous and optimal algorithm, the result proves that the optimal algorithm improves the efficiency of complex query significantly.

关 键 词:查询优化 分布并行处理 多连接 信息网模型( INM) 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象