采样过滤下的时空热点查询  

Spatio-temporal Hot Spot Detection Based on Sampling and Filtering

在线阅读下载全文

作  者:李艳云 牛保宁 康家兴 LI Yanyun;NIU Baoning;KANG Jiaxing(College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China)

机构地区:[1]太原理工大学信息与计算机学院,太原030024

出  处:《太原理工大学学报》2020年第4期522-529,共8页Journal of Taiyuan University of Technology

基  金:国家重点研发计划基金资助项目(2017YFB1401001-01);山西省重点研发计划基金资助项目(国际科技合作)(201903D421007)。

摘  要:目前较高效的时空热点查询算法是基于Spark分布式计算框架和Getis-Ord统计量的两阶段map-reduce算法。为了解决其在第一阶段map-reduce,遍历全部轨迹数据导致耗时严重及数据分布不均匀导致资源空闲的问题,本文提出一种对轨迹数据采样的方法S-RSampling(stratified-random sampling),通过分析轨迹数据的分布规律,确定采样规模,从而减轻数据分布不均的影响,大幅降低查询时间;为了解决在第二阶段map-reduce,计算大量无用立方单元格导致计算浪费的问题,本文提出一种阈值过滤方法TFiltering(threshold filtering),根据单元格属性值的分布,动态确定阈值T,将属性值top-T的立方单元格作为热点候选集,从而减少计算浪费。实验表明,本文所提出的优化方法在查询结果准确率不降低的情况下能大幅降低查询响应时间。The current efficient algorithm of querying spatio-temporal hot spot is the two-stage map-reduce algorithm based on the Spark distributed computing framework and Getis-Ord statistics.A method S-RSampling(Stratified-Random Sampling)of sampling trajectory data was proposed to solve the problem that traversing all the trajectory data causes serious time consuming and the uneven data distribution causes the resources to be idle in the first stage map-reduce.The method analyzes the distribution of trajectory data to determine the sampling scale,thereby reducing the impact of uneven data distribution and query time greatly.A threshold filtering method TFiltering(Threshold Filtering)was proposed to solve the problem that processing a large number of useless cubes leads to calculation waste in the second stage map-reduce.The method dynamically determines the threshold according to the distribution of cubes’value and takes the cubes of value top-T as candidate sets of hot spots,which reduces the calculation waste.Experiments show that the optimization methods proposed in this paper can reduce the query response time greatly without reducing the accuracy of query results.

关 键 词:时空热点 分布式计算 规律采样 阈值过滤 热点候选集 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象