基于水车模型的时序大数据快速存储  被引量:2

Fast Storage System for Time-series Big Data Streams Based on Waterwheel Model

在线阅读下载全文

作  者:陆铭琛 吕晏齐 刘睿诚 金培权[1] LU Mingchen;LYU Yanqi;LIU Ruicheng;JIN Peiquan(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China)

机构地区:[1]中国科学技术大学计算机科学与技术学院,合肥230027

出  处:《计算机科学》2023年第1期25-33,共9页Computer Science

基  金:国家自然科学基金(62072419)。

摘  要:近年来,随着物联网的高速发展,传感器部署的规模日益壮大。大规模的传感器每秒都会产生大量数据流,并且数据的价值会随着时间的流逝逐渐降低。因此,存储系统不仅需要能承受高速到达的数据流带来的写入压力,还需要以最快的速度将数据持久化,以供后续的查询和分析。这对存储系统的写入性能提出了更高的要求。基于水车模型的快速存储系统可以满足大数据应用场景下的高速时序数据流快速存储需求。该系统部署在高速时序数据流和底层存储节点之间,利用多个数据桶构建一个逻辑上轮转的存储模型(类似于中国古代的水车),并且通过控制每个数据桶的状态来协调数据的写入和落盘。水车模型将数据桶分配给不同的底层存储节点,从而将瞬时写入压力均摊到多个底层存储节点上,并借助多节点的并行写入提高写吞吐。水车模型被部署在单机版MongoDB上,并和分布式MongoDB进行了实验对比。实验结果表明,水车模型可以有效提升系统的写吞吐,降低写入延迟,并且具有良好的横向可扩展性。With the rapid development of the Internet of Things,the scale of sensor deployment has been growing in recent years.Large-scale sensors generate massive streaming data every second,and the value of the data decreases over time.Therefore,the storage system needs to be able to withstand the write pressure brought by the high-speed arriving streaming data and persist the data as fast as possible for subsequent query and analysis.This poses a considerable challenge to the write performance of the storage system.The fast storage system based on the waterwheel model can meet the fast storage requirements of high-speed time-series data streams in big data application scenarios.The proposed system is deployed between high-speed streaming data and underlying storage nodes,using multiple data buckets to build a logically rotating storage model(similar to the ancient Chinese waterwheel),and coordinating data writing and persisting by controlling the state of each data bucket.Waterwheel sends data buckets to different underlying storage nodes,so that the instantaneous write pressure is evenly distributed to multiple underlying storage nodes,and the write throughput is improved with the help of multi-node parallel writing.The waterwheel model is deployed on a stand-alone version of MongoDB,and compared with the distributed MongoDB in experiments.The results show that the proposed system can effectively improve the write throughput of the system,reduce the write latency,and has good horizontal scalability.

关 键 词:时序大数据 流式数据 快速存储 水车模型 中间件 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象