基于DiffNodeset结构的最大频繁项集挖掘算法  被引量:5

Maximal frequent itemset mining algorithm based on DiffNodeset structure

在线阅读下载全文

作  者:尹远 张昌[1] 文凯 郑云俊[1] YIN Yuan;ZHANG Chang;WEN Kai;ZHENG Yunjun(Institute of Communication Engineering Application Institute of Applied Communication Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;Chongqing Branch, China Telecom Company Limited, Chongqing 401121, China;Chongqing Information Technology Designing Company Limited, Chongqing 401121, China)

机构地区:[1]重庆邮电大学通信新技术应用研究中心,重庆400065 [2]中国电信股份有限公司重庆分公司,重庆401121 [3]重庆信科设计有限公司,重庆401121

出  处:《计算机应用》2018年第12期3438-3443,共6页journal of Computer Applications

摘  要:在数据挖掘中,通过挖掘最大频繁项集来代替挖掘频繁项集可以大大地提升系统的运行效率。针对现有的最大频繁项集挖掘算法的运行时间消耗仍然很大的问题,提出了一种基于DiffNodeset结构的最大频繁项集挖掘(DNMFIM)算法。首先,采用了一种新的数据结构DiffNodeset来实现求交集以及支持度的快速计算;其次,引入一种新的线性复杂度的连接方法来降低两个DiffNodeset在连接过程中的复杂度,避免了多次的无效计算;然后,将集合枚举树作为搜索空间,同时采用多种优化剪枝策略来缩小搜索空间;最后,再结合最大频繁项集挖掘算法(MAFIA)中所使用的超集检测技术来有效地提高算法的准确性。实验结果表明,DNMFIM算法在时间效率方面性能优于MAFIA与基于N-list的MAFIA(NB-MAFIA),该算法在不同类型数据集中进行最大频繁项集挖掘时均有良好的效果。In data mining,mining maximum frequent itemsets instead of mining frequent itemsets can greatly improve the operating efficiency of system.The running time consumption of existing maximum frequent itemset mining algorithms is still very large.In order to solve the problem,a new DiffNodeset Maxmal Frequent Itemset Mining(DNMFIM)algorithm was proposed.Firstly,a new data structure DiffNodeset was adopted to realize the fast calculation of intersection and support degree.Secondly,the connection method with linear time complexity was adopted to reduce the complexity of connecting two DiffNodesets and avoid multiple invalid calculations.Then,the set-enumeration tree was adopted as the search space,and a variety of optimal pruning strategies were used to reduce the search space.Finally,the superset detection technology used in the MAximal Frequent Itemset Algorithm(MAFIA)algorithm was adopted to improve the accuracy of algorithm effectively.The experimental results show that,DNMFIM algorithm outperforms MAFIA and N-list based MAFIA(NB-MAFIA)in terms of time efficiency.The proposed algorithm has a good performance when mining the maximal frequent itemsets in different types of datasets.

关 键 词:最大频繁项集挖掘 关联规则 集合枚举树 优化剪枝 超集检测 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象