基于Nodeset的最大频繁项集挖掘算法  被引量:6

Maximal Frequent Itemset Mining Algorithm Based on Nodeset

在线阅读下载全文

作  者:林晨[1] 顾君忠[1] 

机构地区:[1]华东师范大学计算机科学技术系,上海200241

出  处:《计算机工程》2016年第12期204-207,216,共5页Computer Engineering

基  金:上海市国际科技合作项目(13430710100);上海市科委科技创新行动计划项目(13511506201)

摘  要:递归遍历、条件FP-Tree构建与超集检测是多数基于FP-Tree最大频繁项集挖掘算法的主要性能瓶颈。为此,提出一种基于Nodeset的最大频繁项集挖掘算法——MFIN算法。该算法采用Nodeset数据结构对POC-Tree的节点编码,将集合枚举树作为搜索空间,避免递归遍历和条件FP-Tree构建的时间开销。设计提前停止方法提高求解Nodeset交集的效率,采用父等价剪枝技术和前瞻剪枝技术缩小搜索空间。对基于MFI-Tree的投影策略进行改进,提升超集检测的速度。实验结果表明,MFIN算法在mushroom,pumsb,webdocs数据集上的运行时间及执行效率等总体性能明显优于基于FP-Tree的FP-Max算法。The major performance bottlenecks of most maximal frequent itemset mining algorithms based on FP-Tree are caused by recursively traversing and constructing conditional FP-Trees and superset check. Therefore, this paper proposes a maximal frequent itemset mining algorithm based on Nodeset named MFIN. Instead of recursively traversing and constructing conditional FP-Trees,this algorithm adopts a novel data structure called Nodeset to encode the nodes of POC-Tree and uses a set-enumeration tree to represent search space. Besides, this paper proposes an early-stopping method to minimize the cost of intersection operation ahead pruning technique to reduce the search of Nodesets and adopts parent equivalence pruning technique and look- space. The efficiency of superset check method is promoted by improving the projection strategy based on MFI-Tree. Experimental results show that the performance of MFIN algorithm is superior to the classic FP-Max algorithm which is based on FP-Tree in running time and efficiency on mushroom, pumsb and webdocs datasets.

关 键 词:最大频繁项集 关联规则 剪枝技术 前缀树 超集检测 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象