检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华东师范大学计算机科学技术系,上海200241
出 处:《计算机工程》2016年第12期204-207,216,共5页Computer Engineering
基 金:上海市国际科技合作项目(13430710100);上海市科委科技创新行动计划项目(13511506201)
摘 要:递归遍历、条件FP-Tree构建与超集检测是多数基于FP-Tree最大频繁项集挖掘算法的主要性能瓶颈。为此,提出一种基于Nodeset的最大频繁项集挖掘算法——MFIN算法。该算法采用Nodeset数据结构对POC-Tree的节点编码,将集合枚举树作为搜索空间,避免递归遍历和条件FP-Tree构建的时间开销。设计提前停止方法提高求解Nodeset交集的效率,采用父等价剪枝技术和前瞻剪枝技术缩小搜索空间。对基于MFI-Tree的投影策略进行改进,提升超集检测的速度。实验结果表明,MFIN算法在mushroom,pumsb,webdocs数据集上的运行时间及执行效率等总体性能明显优于基于FP-Tree的FP-Max算法。The major performance bottlenecks of most maximal frequent itemset mining algorithms based on FP-Tree are caused by recursively traversing and constructing conditional FP-Trees and superset check. Therefore, this paper proposes a maximal frequent itemset mining algorithm based on Nodeset named MFIN. Instead of recursively traversing and constructing conditional FP-Trees,this algorithm adopts a novel data structure called Nodeset to encode the nodes of POC-Tree and uses a set-enumeration tree to represent search space. Besides, this paper proposes an early-stopping method to minimize the cost of intersection operation ahead pruning technique to reduce the search of Nodesets and adopts parent equivalence pruning technique and look- space. The efficiency of superset check method is promoted by improving the projection strategy based on MFI-Tree. Experimental results show that the performance of MFIN algorithm is superior to the classic FP-Max algorithm which is based on FP-Tree in running time and efficiency on mushroom, pumsb and webdocs datasets.
关 键 词:最大频繁项集 关联规则 剪枝技术 前缀树 超集检测
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.65