基于出租车上下客数据流与分布式多阶段网格聚类的城市热点区域实时探测方法  被引量:3

Urban Hotspot Detection from the Data Stream of Taxi Pick-up and Drop-off based on Distributed Multistage Grid Clustering

在线阅读下载全文

作  者:王浩成 向隆刚[1] 关雪峰[1] 张叶廷[1] WANG Haocheng;XIANG Longgang;GUAN Xuefeng;ZHANG Yeting(State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing,Wuhan University,Wuhan 430079,China)

机构地区:[1]武汉大学测绘遥感信息工程国家重点实验室,武汉430079

出  处:《地球信息科学学报》2023年第7期1514-1530,共17页Journal of Geo-information Science

基  金:湖北省珞珈实验室专项基金(220100010);湖北省科技重大专项(2020AAA004)。

摘  要:城市热点区域的实时探测能够提高管理者对突发事件的响应能力。随着物联网、通信技术的发展,出租车运单的起讫信息实时上传至数据中心,形成了持续的上下客数据流。考虑到出租车具有全天候运营、全区域覆盖、数据时空分辨率高等特点,其上下客数据流可作为城市热点区域实时探测的有效信息源。目前,面向静态上下客数据集的热点区域探测方法不支持流式数据的处理,难以直接应用于实时的热点区域探测,而现有流式聚类算法难以同时满足低聚合成本、任意形状类簇识别、灵活扩展性等要求。面对以上挑战,本文基于分布式流计算技术,设计了适用于出租车上下客数据流的城市热点区域探测算法,基本思想为将上下客数据流映射至网格状监控单元,并以时间窗口为单位统计各监控单元热度,在此基础上进行热点单元的分布式识别,最终将热点单元汇聚为热点区域。为了避免分布式算法中聚合算子的性能瓶颈,本文进一步设计了由冗余分区、链接识别、修正规则构建、区域ID修正、区域生成等步骤组成的多阶段分布式区域合并算法。最后,本文基于分布式流计算框架Flink实现了上述算法,并使用武汉市出租车数据集、纽约市出租车上下客数据集模拟数据流开展实验,结果表明本算法可以高效挖掘城市空间的热点区域分布及其动态变化,在并行度为8时吞吐量可达9万条/s,具有较好的性能与可扩展性。Real-time identification of urban hotspot areas can improve the response ability of city managers on emergencies.With the development of the Internet of Things and communication technology,the starting and ending information of taxi trips can be uploaded to the data center in real-time,forming a massive and continuous data stream of pick-up and drop-off events.Taxi is a welcoming means of transportation,and have characteristics of all-weather operation,full regional coverage,and high spatial-temporal resolution,so its pickup and drop-off data stream can be used as a high-quality data source for real-time identification of urban hotspots.However,the hotspots area identification methods aimed at historical data sets have a high delay and can’t meet the real-time requirement.At the same time,the existing clustering algorithm based on distributed streaming processing technology is difficult to meet all the requirements including low aggregation cost,good scalability,and supporting arbitrary shape cluster recognition when facing pick-up and drop-off streams.Based on the distributed stream processing technology,an urban hotspot area identification method suitable for taxi pick-up and drop-off data stream is designed in this study.By mapping the real-time pick-up and drop-off records to grid monitoring units,we can obtain the heat value of each monitoring unit for each time window,filter the monitoring units which have higher heat values than a specified threshold as hot units,and finally gather the hot units of same time window into hotspot areas.To avoid the performance bottleneck of the aggregation operator in distributed region identification,a multi-stage distributed hot area aggregating method is designed.The method is implemented on Apache Flink,and the pick-up and drop-off data stream is simulated with the historical taxi trip records from Wuhan and New York City.The results show that:(1)The spatial distribution and status of hotspots differ from time to time,which is related to citizens'activities at differen

关 键 词:出租车上下客 数据流 热点区域探测 分布式计算 网格聚类 实时 武汉市 纽约市 

分 类 号:U492.434[交通运输工程—交通运输规划与管理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象