面向多核CPU和GPU平台的数据库星形连接优化  被引量:4

Database star-join optimization for multicore CPU and GPU platforms

在线阅读下载全文

作  者:刘专 韩瑞琛 张延松[1,2,3] 陈跃国 张宇[4] LIU Zhuan;HAN Ruichen;ZHANG Yansong;CHEN Yueguo;ZHANG Yu(Key Laboratory of Data Engineering and Knowledge Engineering(Renmin University of China),Beijing 100872,China;School of Information,Renmin University of China,Beijing 100872,China;National Survey Research Center at Renmin University of China,Beijing 100872,China;National Satellite Meteorological Center,China Meteorological Administration,Beijing 100081,China)

机构地区:[1]数据工程与知识工程教育部重点实验室(中国人民大学),北京100872 [2]中国人民大学信息学院,北京100872 [3]中国人民大学中国调查与数据中心,北京100872 [4]中国气象局国家卫星气象中心,北京100081

出  处:《计算机应用》2021年第3期611-617,共7页journal of Computer Applications

基  金:国家自然科学基金资助项目(61772533,61732014);北京市自然科学基金资助项目(4192066)。

摘  要:针对联机分析处理(OLAP)中事实表与多个维表之间的星形连接执行代价较高的问题,提出了一种在先进的多核中央处理器(CPU)和图形处理器(GPU)上的星形连接优化方法。首先,对于多核CPU和GPU平台的星形连接中的物化代价问题,提出了基于向量索引的CPU和GPU平台上的向量化星形连接算法;然后,通过面向CPU cache和GPU shared memory大小的向量划分来提出基于向量粒度的星形连接操作,从而优化星形连接中向量索引的物化代价;最后,提出了基于压缩向量的星形连接算法,将定长向量索引压缩为变长的二元向量索引,从而在低选择率时提高cache内向量索引的存储访问效率。实验结果表明,在CPU平台上向量化星形连接算法相对于常规的行式或列式连接性能提升了40%以上,在GPU平台上向量化星形连接算法相对于常规星形连接算法性能提升超过了15%;与当前主流的内存数据库和GPU数据库相比,优化的星形连接算法性能相对于最优内存数据库Hyper性能提升了130%,相对于最优的GPU数据库OmniSci性能提升了80%。可见基于向量索引的向量化星形连接优化技术有效地提高了多表连接性能,与传统优化技术相比,基于向量索引的向量化处理提高了较小cache上的数据存储访问效率,压缩向量进一步提升了向量索引在cache内的访问效率。Focusing on the high execution cost of star-join between the fact table and multiple dimension tables in Online Analytical Processing(OLAP),a star-join optimization technique was proposed for advanced multicore CPU(Central Processing Unit) and GPU(Graphics Processing Unit).Firstly,the vector index based vectorized star-join algorithm on CPU and GPU platforms was proposed for the intermediate materialization cost problem in star-join in multicore CPU and GPU platforms.Secondly,the star-join operation based on vector granularity was presented according to the vector division for CPU cache size and GPU shared memory size,so as to optimize the vector index materialization cost in star-join.Finally,the compressed vector index based star-join algorithm was proposed to compress the fixed-length vector index to the variablelength binary vector index,so as to improve the storage access efficiency of the vector index in cache under low selection rate.Experimental results show that the vectorized star-join algorithm achieves more than 40% performance improvement compared to the traditional row-wise or column-wise star-join algorithms on multicore CPU platform,and the vectorized starjoin algorithm achieves more than 15% performance improvement compared to the conventional star-join algorithms on GPU platform;in the comparison with the mainstream main-memory databases and GPU databases,the optimized star-join algorithm achieves 130% performance improvement compared to the optimal main-memory database Hyper,and achieves 80% performance improvement compared to the optimal GPU database OmniSci.It can be seen that the vector index based star-join optimization technique effectively improves the multiple table join performance,and compared with the traditional optimization techniques,the vector index based vectorized processing improves the data storage access efficiency in small cache,and the compressed vector further improves the vector index access efficiency in cache.

关 键 词:联机分析处理 星形连接 向量化查询处理 向量压缩技术 异构计算 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象