基于Hilbert空间分区和Geohash索引的并行Ripley's K函数被引量：4

Parallel Ripley’s K Function based on Hilbert Spatial Partitioning and Geohash Indexing

作　　者：亢扬箫桂志鹏[1,3] 丁劲宸吴京航吴华意[3] KANG Yangxiao;GUI Zhipeng;DING Jinchen;WU Jinghang;WU Huayi(School of Remote Sensing and Information Engineering,Wuhan University,Wuhan 430079,China;Chongqing Geomatics and Remote Sensing Center,Chongqing 401147,China;State Key Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing,Wuhan University,Wuhan 430079,China)

机构地区：[1]武汉大学遥感信息工程学院,武汉430079 [2]重庆市地理信息与遥感应用中心,重庆401147 [3]武汉大学测绘遥感信息工程国家重点实验室,武汉430079

出　　处：《地球信息科学学报》2022年第1期74-86,共13页Journal of Geo-information Science

基　　金：国家重点研发计划项目(2018YFC0809806,2017YFB0503704);国家自然科学基金项目(41971349,42090010)。

摘　　要：作为二阶点模式分析方法,Ripley’s K函数(简称K函数)以距离为自变量探测不同尺度下点事件的分布模式及演变规律,在生态学、经济学、地理学等诸多领域得到广泛应用。然而,随着点规模的增加,估计与模拟阶段点对距离遍历计算时间开销激增,严重制约了K函数的应用,算法流程优化与并行加速成为应对海量点数据下K函数性能瓶颈及可计算性问题的关键技术手段。针对默认数据分区未考虑点事件空间邻近性导致跨节点通讯成本高昂且K函数距离阈值较大时索引优化失效的现象,本文提出一种基于空间填充曲线的K函数优化加速方法。该方法采用Hilbert曲线构建空间分区,在顾及数据空间邻近性的前提下减少分区间数据倾斜和通讯开销;在分区基础上,利用Geohash编码改进各分区内本地空间索引策略加速点对距离计算。本文以湖北省工商企业注册数据为例,通过对比实验分析了默认分区无索引、KDB分区组合R树索引、本文Hilbert分区组合Geohash索引算法在不同数据规模、距离阈值、集群规模下的计算耗时。结果表明,300 000点数据规模下本文方法的时间开销约为默认分区无索引方法的1/4,9台节点下加速比超过3.6倍。因此,该方法能有效提升分布式环境下K函数计算性能并具有良好的可伸缩性,可为其他点模式分析方法的优化提供参考。As a second-order analysis method of spatial point patterns, Ripley’s K function(K function for short)uses distance as an independent variable to detect the distribution patterns of points under different spatial scales,which has been widely used in distinct fields such as ecology, economics, and geography. However, the applications of K function are limited due to the sharply increased computational cost of nested traversals on the point-pair distance measurements in both estimation and simulation phases when the point size getting larger.Therefore, the optimization of algorithm workflow and parallel acceleration have become the key technologies for tackling the performance bottleneck and computability problem of K function. Among these solutions, hashbased partitioning has been widely adopted in parallel computing frameworks for enabling data decomposition,while R-tree indexes have been proposed to reduce the computational cost of point-pair distance measurements by using spatial query instead. However, default hash-based partitioning methods ignore the spatial proximity of data distributions, while R-tree indexes fail to save query time of neighboring points under large spatial distance threshold comparing with pointwise distance calculation. In order to address these issues, this paper proposes an acceleration method for K function based on the space filling curves. Specifically, the Hilbert curve is adopted to achieve spatial partitioning, which reduces the data tilt and communication cost between partitions by better considering the spatial proximity. Upon the partition result, local indexing based on Geohash code is further developed to improve the spatial indexing strategy, which embeds spatial information in codes for achieving quick distance filtering, in turn accelerates the pointwise distance measurements. To verify the effectiveness of the proposed method, it is compared with two optimization methods adopted in previous studies, i.e., default partition without indexing, and KDB-tree partition with

关键词：Ripley’s K函数分布式计算 Apache Spark 高性能地理计算 HILBERT曲线 Geohash编码点模式分析空间填充曲线

分类号：P208[天文地球—地图制图学与地理信息工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Hilbert空间分区和Geohash索引的并行Ripley's K函数被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Hilbert空间分区和Geohash索引的并行Ripley's K函数 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Hilbert空间分区和Geohash索引的并行Ripley's K函数被引量：4