检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华东师范大学数据科学与工程研究院,上海高可信计算重点实验室,上海200062
出 处:《计算机学报》2016年第10期2102-2113,共12页Chinese Journal of Computers
基 金:国家自然科学基金重点项目(61332006)资助
摘 要:在大数据背景下,数据库系统表连接操作的效率急需优化,尤其对于基线与增量数据分离的数据库系统来说,其连接操作更是成为其性能的主要瓶颈.为了有效提升事务处理的性能,在基线与增量数据分离的数据库系统架构中,通常将基线数据存储于磁盘中,增量数据存储于内存中,进而获得较高的事务处理吞吐量和可扩展性.Hbase、BigTable、OceanBase等系统是典型的基线与增量数据分离的数据库管理系统,但是他们的表连接效率较低,其主要原因包括:每次表连接前必须先合并基线数据和增量数据;数据存储模式更为复杂,导致过大的网络开销.该文提出了一种基线与增量数据分离架构下的排序归并连接优化算法.该算法对连接属性做范围切分,在多个节点上并行做排序归并连接.该算法无需在连接前合并基线数据和增量数据,进而实现对基线和增量数据并行处理,同时也避免了大量非连接结果集数据的基线与增量合并操作.并在开源的数据库OceanBase上实现了该算法,通过一系列实验证明,该算法可以极大提高OceanBase数据库的表连接处理性能.In this big data era, the efficiency of join operator is needed to be optimized imperatively, especially for database systems with separated baseline and incremental data. In this database system architecture, the baseline data is stored in the disk as usual, while the incremental data is stored in the memory to achieve both higher transactional processing efficiency and scalability. HBase, BigTable, OceanBase are typical database systems deployed with such separated baseline and incremental data architecture, but they provided join operator with very low efficiency only. The main reasons are as follows, they have to merge the baseline data and incremental data at first; and the network overhead is very heavy because of the complex data model they used. This paper proposes an algorithm for efficient join operator based on separated baseline data and incremental data. It partitions the join attributes into specified ranges first and merges each range on different nodes in parallel. The key point of this algorithm is that it partitions, sorts the baseline data and incremental data separately to achieve even higher parallelism before merge join and avoids the cost of merge of the baseline and incremental data tuples which will not be appeared in the result set. We implement this algorithm based on OceanBase, an open sourced distributed database system. The experimental results confirm that our algorithm can improve the join performance of OceanBase database by a large margin.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43