面向分布式列式存储的轨迹大数据k近邻查询  被引量:9

kNN Query Processing for Trajectory Big Data Based on Distributed Column-Oriented Storage

在线阅读下载全文

作  者:余列冰 向隆刚[1] 孙尚宇 关雪峰[1] 吴华意[1] YU Liebing;XIANG Longgang;SUN Shangyu;GUAN Xuefeng;WU Huayi(State Key Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing,Wuhan University,Wuhan 430079,China)

机构地区:[1]武汉大学测绘遥感信息工程国家重点实验室,湖北武汉430079

出  处:《武汉大学学报(信息科学版)》2021年第5期736-745,共10页Geomatics and Information Science of Wuhan University

基  金:国家重点研发计划(2018YFB2100603);国家自然科学基金(41771474,41930107)。

摘  要:针对轨迹大数据的高效点-轨迹k近邻(point to trajectory k nearest neighbor, P2Tk NN)查询处理需求,提出了一种融合时空剖分和轨迹分段的轨迹组织方法,其核心思想是在对轨迹作时间剖分的基础上,利用离散全球网格系统(discrete global grid system, DGGS)在空间上进行再次剖分,从而利用两次剖分得到的时空单元编码来索引落入其中的轨迹片段。在此基础上利用分布式列式存储技术设计了面向轨迹大数据的P2Tk NN查询处理框架,提出了一种顾及轨迹数据空间分布的自适应空间单元搜索算法,即通过分析轨迹数据在给定时间约束下的空间分异特征,动态调整空间单元的搜索步长,从而提升了轨迹稀疏区域的处理效率。针对亿级轨迹的实验结果表明,该方法适用于轨迹大数据的P2Tk NN查询处理,在轨迹稠密与稀疏区域的平均查询响应时间均小于1 s。Objectives: With the advancement of mobile object tracking technologies, huge volumes of spatiotemporal data have been accumulated in the form of location trajectories. Such data bring us new opportunities and challenges in efficient trajectory retrieval. Among them, point to trajectory k nearest neighbor(P2 Tk NN) query plays a pivotal role in many mobile object-oriented analyses and applications. It has been widely used in tracing the source of infectious diseases, recommending travel routes, and generating and updating local road networks based on trajectories.Methods: Aiming at the requirement of efficient P2 Tk NN query processing for large-scale trajectory data, a trajectory organization method combining spatiotemporal segmentation and trajectory segmentation is proposed. The core idea is to use the discrete global grid system(DGGS) to segment the trajectory in spatial based on the previous time division. The spatiotemporal cell encoding is applied to index the trajectory segments falling in it. In order to realize efficient encoding of spatiotemporal cells, three concatenated encoding structures of spatiotemporal divide and conquer are proposed to solve the huge difference in dimension and scale between temporal dimension and spatial dimension. Based on above, a P2 Tk NN query processing framework for distributed column-oriented storage is designed, and an adaptive spatial cell search algorithm that takes into account the spatiotemporal distribution of trajectory data is proposed. The step size for spatial cell searching is adaptively set according to the spatial stratified heterogeneity of the trajectory data under the given time constraint, which greatly improves the search efficiency of areas with sparse trajectory data.Results: Experimental results based on billion-level trajectories show that the method is suitable for P2 Tk NN query processing of big trajectory data.(1) Different encoding structures are suitable for different query scenarios. Appropriate encoding structure can be used to make query r

关 键 词:轨迹大数据 K近邻查询 时空编码 自适应搜索 分布式列式存储 

分 类 号:P208[天文地球—地图制图学与地理信息工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象