机构地区:[1]南京大学计算机软件新技术国家重点实验室,南京210023 [2]南京大学计算机科学与技术系,南京210023
出 处:《计算机学报》2020年第12期2259-2275,共17页Chinese Journal of Computers
基 金:国家自然科学基金项目(61572250,U1811461);江苏省科技支撑计划项目(BE2017155);江苏省软件新技术与产业化协同创新中心资助.
摘 要:随着信息技术的普及应用,城市公共服务热线平台累积了大量亟待分析的民生诉求数据.传统事件检测方法缺少对于地域模式的考虑,同时,其所依赖的GPS地理信息也不易获得.因此,难以直接运用现有的突发事件检测方法挖掘公共服务热线中潜在的民生突发事件.为此,本文提出了一种基于地域自适应的突发事件实时检测方法(RAEDetection).首先,提出一种基于增量式Kleinberg模型的突发词识别算法,克服了现有批处理式Kleinberg模型的局限性,可从流式增量数据中实时识别突发词汇;然后,提出一种基于分层语义分析的候选突发事件识别算法,以突发词为线索,先根据突发词的主题层语义信息确定突发主题事件,再根据诉求记录的事件层语义信息将每个突发主题事件进一步细分为多个候选突发事件;最后,提出一种基于事件地域树的地域模式自适应识别算法,通过构建包含市级、区级、街道级三层结构的事件地域树,并通过基于KL距离的事件地域分布检验与优化,自适应地识别不同事件发生的地域模式,过滤候选突发事件中的噪声数据,得到最终的突发事件.在城市公共服务真实数据集以及Twitter数据集上的实验结果表明,与目前最新的方法对比,本文方法具有更高的检测准确率和更快的计算性能,能够有效地检测出数据流中的突发事件,算法具备良好的数据和系统可扩展性.本文方法已经成功落地应用于江苏省公共服务热线平台,提供高效的自动化和智能化突发事件检测服务.With the popularization of information technology,the civic public service platform has accumulated a large number of public livelihood complaint data that need to be analyzed.The traditional event detection methods do not take the regional patterns of events into consideration.Meanwhile,the GPS geographic information used by these methods is also not easy to obtain.Therefore,many studies are looking for efficient and accurate methods to deal with the problem of recognizing the region patterns of events.However,it is not efficient to use the existing event detection methods to capture the potential events in the civic public service.In this paper,we propose a real-time region-adaptive method for bursty event detection,called RAEDetection.First,the recognition of bursty words from data stream is the basis of discovering the bursty events.The traditional Kleinberg model can only find these bursty words from the static data.Therefore,we propose an improved incremental Kleinberg model to identify the bursty words from the real-time data stream.Then,after obtaining the bursty words,we propose an algorithm based on hierarchical semantic analysis for recognizing the candidate bursty events.With bursty words as clues,this algorithm finds the topic bursty events with semantic information from topics and then divides these events into more fine-grained candidate bursty events with the semantic information from the complaint records.Finally,in order to filter out the noise records in the candidate events,the event region tree is constructed to recognize the regional patterns of events.The event region tree has a three-level structure corresponding to the addresses in the city,district and street level respectively.According to the maximum entropy principle,we assume that the address distribution of one certain event obeys the discrete uniform distribution.We use KL distance to compare the distance between the statistic address distribution and the assumed address distribution.We choose the number of addresses which can mini
关 键 词:事件检测 突发性分析 地域自适应 公共服务热线 数据挖掘
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...