利用Hilbert曲线与Cassandra技术实现时空大数据存储与索引  被引量:14

Hilbert Curve and Cassandra Based Indexing and Storing Approach for Large-Scale Spatiotemporal Data

在线阅读下载全文

作  者:曹布阳 冯华森 梁峻浩 李响[2] CAO Buyang;FENG Huasen;LIANG Junhao;LI Xiang(College of Architecture and Urban Planning,Tongji University,Shanghai 200092,China;Key Laboratory of Geographical,Information Science(Ministry of Education),School of Geographic Sciences,East China Normal University,Shanghai 200241,China)

机构地区:[1]同济大学建筑与城市规划学院,上海200092 [2]华东师范大学地理科学学院地理信息科学教育部重点实验室,上海200241

出  处:《武汉大学学报(信息科学版)》2021年第5期620-629,共10页Geomatics and Information Science of Wuhan University

基  金:国家自然科学基金(41771410)。

摘  要:随着越来越多的轨迹数据被记载,各种应用场景下的海量、复杂数据需要高效的存储与索引。传统的关系型数据库难以满足海量轨迹数据的存储、扩展及特定的查询需求,而具有扩展简单、读写快速、成本低廉特点的非关系型数据库为此提供了一种可行的解决方案。设计并实现了一种基于Cassandra数据库的数据降维及键值存储、索引方法,可对时空轨迹数据进行高效管理。为进一步提高效率,融合了Hilbert曲线编码技术将空间分割成小单元,并将轨迹数据映射到不同单元中。充分利用时空局部性原理,为不同应用场景下的轨迹数据设计并实现了对应的分区键与聚簇键,实现轨迹对象时空近邻存储,令数据查询更为有效。基于实际应用场景的实验结果表明,所提出的方法能有效支撑海量轨迹数据的存储与索引,并在数据的插入、查询及存储结构可扩展性等方面优于其他时空大数据索引和查询方法。Objectives: Because of the fast growing acquisition of real-time spatiotemporal data for various applications such as smart city or real-time air-quality monitoring, the traditional database technologies cannot satisfy the higher standards for large-scale data indexing, querying, and storing operations. As the viable alternative, NoSQL databases that are scalable and possess fast input/output capabilities offer potential solutions to accommodate the needs. Methods: We propose a Hilbert curve and Cassandra technologies based approach for efficient indexing and storing of large-scale spatiotemporal datasets aiming to provide an effective framework for processing, querying, and analyzing large amount of data with spatial and temporal features. For example, the dataset of vehicle trajectories contains valuable spatial and temporal features those are being employed in the real world. The collected spatiotemporal datasets are preprocessed in order to fit the proposed structures for different applications. Specifically, two types of query applications commonly used in the real world are the spatiotemporal range query and query upon vehicle IDs respectively.Two corresponding indexing structures are designed and implemented in order to accommodate the requests.S2 Geometry Library open sourced by Google is utilized to divide the earth surface into grids, and data points fall in grids are assigned with the specific IDs as the keys. The keys and columns are so designed by applying the Hilbert curve and Cassandra techniques that the resultant structures will physically store the spatially neighboring data points close to each other, and they are more suitable for large-scale spatiotemporal data querying and analyzing applications.Results: The datasets acquired from the real applications are used to conduct the computational experiments to validate the efficiency of the proposed approach. The query efficiency and the time consumed to store large amount of spatiotemporal data are investigated and benchmarked against some existi

关 键 词:时空大数据 Cassandra 分布式存储 车辆轨迹 键值 空间编码 

分 类 号:P208[天文地球—地图制图学与地理信息工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象