基于Nodeset的最大频繁项集挖掘算法被引量：6

Maximal Frequent Itemset Mining Algorithm Based on Nodeset

机构地区：[1]华东师范大学计算机科学技术系,上海200241

出　　处：《计算机工程》2016年第12期204-207,216,共5页Computer Engineering

基　　金：上海市国际科技合作项目(13430710100);上海市科委科技创新行动计划项目(13511506201)

摘　　要：递归遍历、条件FP-Tree构建与超集检测是多数基于FP-Tree最大频繁项集挖掘算法的主要性能瓶颈。为此,提出一种基于Nodeset的最大频繁项集挖掘算法——MFIN算法。该算法采用Nodeset数据结构对POC-Tree的节点编码,将集合枚举树作为搜索空间,避免递归遍历和条件FP-Tree构建的时间开销。设计提前停止方法提高求解Nodeset交集的效率,采用父等价剪枝技术和前瞻剪枝技术缩小搜索空间。对基于MFI-Tree的投影策略进行改进,提升超集检测的速度。实验结果表明,MFIN算法在mushroom,pumsb,webdocs数据集上的运行时间及执行效率等总体性能明显优于基于FP-Tree的FP-Max算法。The major performance bottlenecks of most maximal frequent itemset mining algorithms based on FP-Tree are caused by recursively traversing and constructing conditional FP-Trees and superset check. Therefore, this paper proposes a maximal frequent itemset mining algorithm based on Nodeset named MFIN. Instead of recursively traversing and constructing conditional FP-Trees,this algorithm adopts a novel data structure called Nodeset to encode the nodes of POC-Tree and uses a set-enumeration tree to represent search space. Besides, this paper proposes an early-stopping method to minimize the cost of intersection operation ahead pruning technique to reduce the search of Nodesets and adopts parent equivalence pruning technique and look- space. The efficiency of superset check method is promoted by improving the projection strategy based on MFI-Tree. Experimental results show that the performance of MFIN algorithm is superior to the classic FP-Max algorithm which is based on FP-Tree in running time and efficiency on mushroom, pumsb and webdocs datasets.

关键词：最大频繁项集关联规则剪枝技术前缀树超集检测

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Nodeset的最大频繁项集挖掘算法被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Nodeset的最大频繁项集挖掘算法 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Nodeset的最大频繁项集挖掘算法被引量：6