机构地区:[1]武汉大学遥感信息工程学院,武汉430079 [2]重庆市地理信息与遥感应用中心,重庆401147 [3]武汉大学测绘遥感信息工程国家重点实验室,武汉430079
出 处:《地球信息科学学报》2022年第1期74-86,共13页Journal of Geo-information Science
基 金:国家重点研发计划项目(2018YFC0809806,2017YFB0503704);国家自然科学基金项目(41971349,42090010)。
摘 要:作为二阶点模式分析方法,Ripley’s K函数(简称K函数)以距离为自变量探测不同尺度下点事件的分布模式及演变规律,在生态学、经济学、地理学等诸多领域得到广泛应用。然而,随着点规模的增加,估计与模拟阶段点对距离遍历计算时间开销激增,严重制约了K函数的应用,算法流程优化与并行加速成为应对海量点数据下K函数性能瓶颈及可计算性问题的关键技术手段。针对默认数据分区未考虑点事件空间邻近性导致跨节点通讯成本高昂且K函数距离阈值较大时索引优化失效的现象,本文提出一种基于空间填充曲线的K函数优化加速方法。该方法采用Hilbert曲线构建空间分区,在顾及数据空间邻近性的前提下减少分区间数据倾斜和通讯开销;在分区基础上,利用Geohash编码改进各分区内本地空间索引策略加速点对距离计算。本文以湖北省工商企业注册数据为例,通过对比实验分析了默认分区无索引、KDB分区组合R树索引、本文Hilbert分区组合Geohash索引算法在不同数据规模、距离阈值、集群规模下的计算耗时。结果表明,300 000点数据规模下本文方法的时间开销约为默认分区无索引方法的1/4,9台节点下加速比超过3.6倍。因此,该方法能有效提升分布式环境下K函数计算性能并具有良好的可伸缩性,可为其他点模式分析方法的优化提供参考。As a second-order analysis method of spatial point patterns, Ripley’s K function(K function for short)uses distance as an independent variable to detect the distribution patterns of points under different spatial scales,which has been widely used in distinct fields such as ecology, economics, and geography. However, the applications of K function are limited due to the sharply increased computational cost of nested traversals on the point-pair distance measurements in both estimation and simulation phases when the point size getting larger.Therefore, the optimization of algorithm workflow and parallel acceleration have become the key technologies for tackling the performance bottleneck and computability problem of K function. Among these solutions, hashbased partitioning has been widely adopted in parallel computing frameworks for enabling data decomposition,while R-tree indexes have been proposed to reduce the computational cost of point-pair distance measurements by using spatial query instead. However, default hash-based partitioning methods ignore the spatial proximity of data distributions, while R-tree indexes fail to save query time of neighboring points under large spatial distance threshold comparing with pointwise distance calculation. In order to address these issues, this paper proposes an acceleration method for K function based on the space filling curves. Specifically, the Hilbert curve is adopted to achieve spatial partitioning, which reduces the data tilt and communication cost between partitions by better considering the spatial proximity. Upon the partition result, local indexing based on Geohash code is further developed to improve the spatial indexing strategy, which embeds spatial information in codes for achieving quick distance filtering, in turn accelerates the pointwise distance measurements. To verify the effectiveness of the proposed method, it is compared with two optimization methods adopted in previous studies, i.e., default partition without indexing, and KDB-tree partition with
关 键 词:Ripley’s K函数 分布式计算 Apache Spark 高性能地理计算 HILBERT曲线 Geohash编码 点模式分析 空间填充曲线
分 类 号:P208[天文地球—地图制图学与地理信息工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...