基于增量局部加权学习的查询模板自适应基数估计  

Incremental Locally Weighted Learning for Adaptive Cardinality Estimation of Query Template

在线阅读下载全文

作  者:冯杰明 李战怀[1,2] 陈群[1,2] 陈肇强[1,2] FENG Jie-Ming;LI Zhan-Huai;CHEN Qun;CHEN Zhao-Qiang(School of Computer Science and Technology,Northwestern Polytechnical University,Xi’an 710072;Key Laboratory of Big Data Storage and Management,Northwestern Polytechnical University,Ministry of Industry and Information Technology,Xi’an 710072)

机构地区:[1]西北工业大学计算机学院,西安710072 [2]西北工业大学大数据存储与管理工业和信息化部重点实验室,西安710072

出  处:《计算机学报》2022年第1期17-34,共18页Chinese Journal of Computers

基  金:国家重点研发计划(2018YFB1003400);国家自然科学基金重点项目(61732014);国家自然科学基金面上项目(61672432);中央高校基本科研业务费专项资金资助(3102019DX1004);陕西省自然科学基础研究计划(2018JM6086)资助.

摘  要:基数估计是基于代价查询优化的关键步骤,已经被研究了近40年.传统方法如基于直方图的方法在一些假设如属性相互独立、相交的表满足包含原则等成立时能基本满足准确性要求.然而,在真实运行环境中这些假设往往不再成立,可能导致基数估计严重错误进而造成查询延迟.近年来,随着数据的增多和新硬件的发展,使用机器学习方法来提高基数估计的质量成为了可能.由于基于代价的查询优化主要根据查询中子执行计划的估计代价来选择最优的查询执行计划,因此,有一些最近的工作针对一些关键的子执行计划模板建立相应的局部学习模型,取得了不错的进展.但是,这些局部模型主要用于查询(查询空间)分布和数据(数据库数据)分布不变的场景,而在真实运行环境中,它们往往不断地发生变化,限制了这些估计技术的有效性.在本文中,我们针对子执行计划模板在查询分布和数据分布不断变化的环境下提出了一种使用增量的局部加权学习进行自适应基数估计的方法.具体地说,首先抽取子执行计划的语义和统计特征使之能代表当前查询和数据的特性,然后使用增量的局部加权学习模型根据查询分布和数据分布的变化进行自适应的学习,实现基数估计.最后,通过对比实验验证了本文方法的有效性.Cardinality estimation is crucial for cost-based query optimization and has been studied for nearly forty years.The traditional approach based on histograms can achieve desired accuracy when some assumptions,e.g.independence of attributes and the principle of inclusion between join tables,are valid.However,these assumptions usually become invalid in the real running environment.It may lead to serious estimation errors and thus increase query latency.In recently years,with the increase of data and the development of new hardware,it is possible to use machine learning to improve the accuracy of cardinality estimation.Because cost-based query optimization chooses the optimal execution plan according to cost estimation of sub execution plan in a given query.Therefore,there are some recent works which studied how to build local learning models for sub execution plan templates.However,these models are supposed to be used in the scenarios where query(query space)distribution and data(database data)distribution remain static.However,in real scenarios,where query and data distributions are usually continuously shifting,they may have limited efficacy.To address the said shortcoming,in this paper we propose an approach based on incremental locally weighted learning to perform adaptive cardinality estimation.Specifically,it first extracts both semantic and statistic features of a sub execution plan,which represent the current characteristics of both query and data.The reasonableness behind that is,the semantic features solely represent variables of the query template and are independent with the database.In some cases,we even cannot collect semantic features,e.g.there is no variable for two tables joining without predicate.Therefore,other features are needed.On the other hand,the statistic features based on current data in the database,e.g.the statistics of attributes in a table,are very effective for predicting the cardinality.However,they are not constantly reliable.Combining with the semantic features can relieve this unc

关 键 词:基数估计 查询优化 执行计划 自适应学习 增量学习 局部加权学习 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象