面向Select和Sort的数据库算子缓存的设计与实现  

Design and Implementation of Database Operator Cache for Select and Sort

在线阅读下载全文

作  者:蔡万里 王新硕 胡卉芪 蔡鹏 周烜 屠要峰[2] CAI Wan-Li;WANG Xin-Shuo;HU Hui-Qi;CAI Peng;ZHOU Xuan;TU Yao-Feng(School of Data Science and Engineering,East China Normal University,Shanghai 200062;ZTE Corporation,Nanjing 210012)

机构地区:[1]华东师范大学数据科学与工程学院,上海200062 [2]中兴通讯股份有限公司,南京210012

出  处:《计算机学报》2024年第9期2084-2103,共20页Chinese Journal of Computers

基  金:国家自然科学基金(No.92270202);上海市自然科学基金(No.23ZR1418300)资助;中兴通讯研究基金(编号HC-CN-20220721010)资助.

摘  要:缓存是数据库中提高查询性能的一种常用技术.目前,现有数据库缓存主要有两个方向:查询结果缓存和存储层块缓存.查询结果缓存是利用数据库查询执行的最终结果或中间结果(如子查询),而存储层块缓存则缓存查询涉及的底层数据块.本文从另外一个角度“缓存中含有的计算量”来重新审视缓存在查询优化中的应用,并以此为基础进一步划分数据库缓存方式.在查询执行过程中,数据库查询被转换成一系列操作(例如选择、排序等)的集合,而算子对应操作.查询处理中算子输出的数据为中间结果,含有部分计算量,我们将这部分数据进行缓存并加以利用.我们将这种缓存部分计算量的缓存方式称为算子缓存,即缓存每个操作执行后的结果.由于不同查询之间可能会存在相同算子,对相近数据执行相同计算,因此利用算子缓存加速查询执行性能具有相当大的潜力.本文的新颖之处在于从缓存含有的计算量角度出发,提出并研究算子缓存如何在查询优化中应用.本文以Filter、Sort算子为例,针对缓存复用提出了一种基于语义树的匹配算法,用于快速匹配缓存中的结果集.同时,针对复用缓存可能劣化查询性能的情况,提出使用基于成本的代价优化器防止使用缓存劣化查询性能.最后,本文基于开源分析型数据库ClickHouse实现了Filter、Sort算子缓存的原型,并对提出的算子缓存方案进行了大量的实验测试.结果表明,相比块缓存、物化视图方式,本文提出的算子缓存方案在本地SSD部署下最大能够分别提升9倍以及1.5倍的查询响应速度,在云环境下部署能够分别提升30倍以及2倍的查询响应速度.Caching is a commonly used technique in databases to enhance query performance.Currently,existing database caching primarily falls into two directions:result caching and block caching.Result caching involves utilizing the final or intermediate results(such as subqueries)obtained during the execution of database queries,while block caching stores the underlying data blocks involved in the queries.This paper takes a different perspective,focusing on the‘computational load’contained within the cache,to reexamine the application of caching in query optimization.Building on this,the paper further classifies database caching methods.During query execution,a database query is transformed into a collection of operations(such as selection,sorting,etc.),with each operation corresponding to an operator.The data output by operators during query processing forms intermediate results,containing a portion of the computational load.We cache and leverage this subset of data.This caching approach,which caches a portion of the computational load,is termed‘operator caching’,specifically caching the results of each operation execution.Due to the potential presence of common operators across different queries,performing similar computations on similar data,leveraging operator caching holds significant potential for accelerating query execution performance.The novelty of this paper lies in its exploration of how operator caching can be applied in query optimization,viewed from the perspective of the computational load contained in the cache.Using Filter and Sort operators as examples,we propose and investigate how operator caching can be employed in query optimization.For cache reuse,we introduce a semantic tree-based matching algorithm designed to efficiently match result sets in the cache.Simultaneously,to address the potential degradation of query performance caused by reusing the cache,we suggest the use of a cost-based optimizer to prevent the degradation of query performance.Finally,based on the open-source analytical dat

关 键 词:数据库 查询执行 查询优化 算子缓存 联机分析处理 

分 类 号:TP315[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象