机构地区:[1]云南大学信息学院,昆明650504
出 处:《计算机学报》2022年第3期526-541,共16页Chinese Journal of Computers
基 金:国家自然科学基金项目(61966036,61662086);云南省创新团队项目(2018HC019);云南大学研究生科研创新基金项目(2020315)资助
摘 要:空间并置(co-location)模式挖掘是指在大量的空间数据中发现一组空间特征的子集,这些特征的实例在地理空间中频繁并置出现.传统的空间并置模式挖掘算法通常采用逐阶递增的挖掘框架,从低阶模式开始生成候选模式并计算其参与度(空间并置模式的频繁性度量指标).虽然这种挖掘框架可以得到正确和完整的结果,但是带来的时间和空间开销非常大.此外传统方法对于空间并置模式的最小频繁性阈值较为敏感,当最小频繁性阈值改变时整个挖掘过程需要重新进行.因此,本文提出一种基于极大团和哈希表的空间并置模式挖掘算法CPM-MCHM(Co-location Pattern Mining based on Maximal Clique and Hash Map)来发现完整并且正确的频繁空间并置模式.CPM-MCHM算法不仅避免逐阶候选-测试框架带来的巨大开销问题,还降低了算法对最小频繁性阈值的敏感.首先,采用基于位运算的分区Bron–Kerbosch算法生成给定空间数据集的所有极大团,并将其存储在哈希表中.然后,提出一种两阶段挖掘框架计算所有模式的参与度并过滤所有频繁空间并置模式.最后,在真实和合成数据集上进行了大量的对比实验.与经典的传统算法和近两年内学者提出的两种算法相比,当实验数据的规模达到20万实例数时,本文提出的CPM-MCHM算法的挖掘时间和空间耗费分别降低了90%和70%以上,当实验数据量进一步加大时CPM-MCHM算法的优势更加明显.Spatial co-location pattern mining refers to the discovery of a set of spatial features in large spatial data sets,and instances of these features frequently appear together in the geographic space.The traditional co-location pattern mining algorithms usually employ an incremental mining framework,generating high-size candidate patterns from the low-size prevalent patterns and compute their participation index(prevalence metrics of the co-location pattern).The participation indexes of candidate patterns are compared with a minimum prevalence threshold given by users to filter prevalent co-location patterns.This iterative process continues to be executed size-by-size,all prevalent co-location patterns are generated completely.Although this mining framework can yield a correct and complete mining result,the cost of execution time and memory space is very expensive.In addition,this mining framework is very sensitive to the minimum prevalence threshold used to filtering prevalent co-location patterns.If the minimum prevalence threshold is changed,the whole mining process needs to be restarted.Since the row instances supporting the prevalence of a co-location pattern are a clique relation under the neighbor relationship,all row instances of the co-location patterns are included in the maximal cliques of the spatial data set,thus they can be used to generate all row instances of all spatial co-location patterns.Therefore,under the premise of discovery of complete and correct prevalent spatial co-location patterns,a Co-location Pattern Mining algorithm based on Maximal Clique and Hash Map(CPM-MCHM)is proposed to address the above shortcomings.The CPM-MCHM algorithm not only avoids the problem of the expensive running time posed by the size-by-size candidate-test framework,but also reduces the sensitivity of the algorithm to the minimum prevalence threshold.Firstly,a bit operation-based and partition Bron-Kerbosch algorithm is presented to enumerate all the maximal cliques of an input spatial data set.The performance of
关 键 词:空间数据挖掘 空间并置模式 两阶段挖掘框架 极大团 哈希表
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...