基于深度自回归模型的近似查询处理方法  

Approximate query processing approach based on deep autoregressive model

在线阅读下载全文

作  者:岑黎彬 李靖东 林淳波 王晓玲[1] CEN Libin;LI Jingdong;LIN Chunbo;WANG Xiaoling(School of Computer Science and Technology,East China Normal University,Shanghai 200062,China;Gauss Database Labs,Huawei Technologies Company Limited,Shanghai 201206,China)

机构地区:[1]华东师范大学计算机科学与技术学院,上海200062 [2]华为技术有限公司高斯实验室,上海201206

出  处:《计算机应用》2023年第7期2034-2039,共6页journal of Computer Applications

基  金:国家自然科学基金重点项目(62136002);上海市科委重点项目(20DZ1100300)。

摘  要:聚合函数的近似查询处理(AQP)是近年来数据库领域的研究热点。针对现有的近似查询技术存在查询响应时间长、存储开销大、不支持多谓词查询等问题,提出一种基于深度自回归模型的AQP方法DeepAQP(Deep Approximate Query Processing),利用深度自回归模型对表中多列数据的联合概率分布进行学习和建模,以估计给定查询的谓词选择度和目标列概率分布,以促进单表下多谓词聚合函数近似查询请求的有效处理。在TPC-H和TPC-DS数据集上进行实验,结果表明,与基于采样的VerdictDB方法相比,DeepAQP在查询响应时间和存储空间开销上均降低了2到3个数量级;与基于传统机器学习模型的DBEst++方法相比,DeepAQP的查询响应时间降低了1个数量级,显著降低了模型训练耗时,并且可以处理DBEst++所不支持的多谓词查询请求。可见,DeepAQP兼顾了查询精度和速度,并显著降低了算法在训练和存储上的开销。Recently,Approximate Query Processing(AQP)of aggregate functions is a research hotspot in the database field.Existing approximate query techniques have problems such as high query response time cost,high storage overhead,and no support for multi-predicate queries.Thus,a deep autoregressive model-based AQP approach DeepAQP(Deep Approximate Query Processing)was proposed.DeepAQP leveraged deep autoregressive model to learn the joint probability distribution of multi-column data in the table in order to estimate the selectivity and the target column’s probability distribution of the given query,enhancing the ability to handle the approximate query requests of aggregation functions with multiple predicates in a single table.Experiments were conducted on TPC-H and TPC-DS datasets.The results show that compared with VerdictDB,which is a sample-based method,DeepAQP has the query response time reduced by 2 to 3 orders of magnitude,and the storage space reduced by 3 orders of magnitude;compared with DBEst++,which is a machine learning-based method,DeepAQP has the query response time reduced by 1 order of magnitude and the model training time reduced significantly.Besides,DeepAQP can handle with multi-predicate query requests,for which DBEst++does not support.It can be seen that DeepAQP achieves good accuracy and speed at the same time and reduces the training and storage overhead of algorithm significantly.

关 键 词:近似查询处理 自回归模型 多谓词查询 深度学习 聚合函数 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象