面向聚合查询的Apache IoTDB物理元数据管理  被引量:9

Physical Metadata Management in Apache IoTDB for Aggregate Queries

在线阅读下载全文

作  者:赵东明 邱圆辉 康瑞 宋韶旭[1,2,3] 黄向东 王建民[1,2,3] ZHAO Dong-Ming;QIU Yuan-Hui;KANG Rui;SONG Shao-Xu;HUANG Xiang-Dong;WANG Jian-Min(School of Software,Tsinghua University,Beijing 100084,China;National Engineering Research Center for Big Data Software(Tsinghua University),Beijing 100084,China;Beijing National Research Center for Information Science and Technology(Tsinghua University),Beijing 100084,China)

机构地区:[1]清华大学软件学院,北京100084 [2]大数据系统软件国家工程研究中心(清华大学),北京100084 [3]北京信息科学与技术国家研究中心(清华大学),北京100084

出  处:《软件学报》2023年第3期1027-1048,共22页Journal of Software

基  金:国家自然科学基金(62072265,62021002);国家重点研发计划(2021YFB3300500,2019YFB1705301,2019YFB17070 01);北京信息科学与技术国家研究中心青年创新基金(BNR2022RC01011);工信部2020年新兴平台软件项目。

摘  要:时间序列数据在能源、制造、金融、气候等领域有着广泛应用,聚合查询是相关分析场景中常见的查询需求,快速获取海量数据的概要信息,对于提高数据分析工作的效率具有重要意义.通过存储元数据加速聚合查询是一种有效的提升聚合查询执行效率的手段,但现有的时间序列数据库都使用时间窗口切分数据,需要对数据进行实时排序和分区,难以适应物联网场景下高并发、大吞吐量的数据写入特点.因此,提出了一种面向聚合查询的ApacheIoTDB物理元数据管理方案.该方案按照数据文件的物理存储特性切分数据,并结合同步计算和异步计算策略,优先保证数据的写入性能.针对时间序列数据中普遍存在的乱序数据,将时间范围重叠的一组文件抽象为乱序文件组并提供元数据,聚合查询会被重写为3个结合物理元数据和原始数据的子查询高效执行.多个数据集上的实验验证了该方案对聚合查询执行效率的提升效果以及不同计算策略对性能的影响.Timeseries data is widely used in energy, manufacturing, finance, climate and many other fields. Aggregate queries are quite common in timeseries data analysis scenarios to quickly obtain summary of massive data. It is an effective way to acceleratin g aggregate queries by storing metadata. However, most existing timeseries databases slice data with fixed time windows, which requires real-time sorting and partitioning. In IoT applications with high writing concurrency and throughput, these additional costs are unacceptable. This study proposes a physical metadata management solution in Apache IoTDB for accelerating aggregate queries, in which data are sliced according to the physical storage sharding of files. Both synchronous and asynchronous computing are adopted to ensure writin g performance ahead of queries. Out-of-order data streams are another major challenge in IoTDB applications. This study abstracts files with overlapping time ranges into out-of-order file groups and provides metadata for each group. Then aggregate queries will be rewritten into three sub-queries and efficiently executed on physical metadata and timeseries data. Experiments on various datasets have shown the improvement in performance of aggregate queries with the proposed solution, as well as the validity of different computing st rategies.

关 键 词:预聚合 聚合查询 查询重写 物理元数据管理 时间序列数据库 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象