一种改进的基于N-List的频繁项集挖掘算法  被引量:6

AN IMPROVED MINING ALGORITHM FOR FREQUENT ITEMSETS BASED ON N-LIST

在线阅读下载全文

作  者:翟悦[1] 王璨[1] 孙建言 Zhai Yue;Wang Can;Sun Jianyan(Department of Information Science,Dalian Institute of Science and Technology,Dalian 116052,Liaoning,China)

机构地区:[1]大连科技学院信息科学系,辽宁大连116052

出  处:《计算机应用与软件》2018年第9期67-72,共6页Computer Applications and Software

摘  要:针对在海量数据中频繁项集挖掘耗时问题,近年来提出的N-List结构可有效提高挖掘效率。基于N-List提出一种新的频繁项集挖掘算法HNSFI(Hash table and subsume frequent itemsets mining based on N-List)。该算法利用PPC-tree生成N-List,引入哈希表存储N-List表示的项集,加快N-List相交操作运算时间;引入包含因子概念,利用其性质通过组合方法可以直接生成部分频繁项集,进一步提高算法时间性能。在三种不同的数据集上对该算法进行了测试和分析,实验结果表明在稠密数据集中该算法的时间性能是最优的。Aiming at the time-consuming problem of mining frequent itemsets in massive data, the N-List structure proposed in recent years can effectively improve the efficiency of mining. In this paper, we presented HNSFI (hash table and subsume frequent itemsets mining based on N-List). The algorithm used PPC-tree to generate N-List, and introduced hash table to store itemsets represented by N-List to speed up N-List interleaving operation time. By introducing the concept of subsuming index and using its properties, some frequent itemsets can be generated directly by combinatorial method, which further improves the time performance. The algorithm was tested and analyzed on three different datasets. Experimental results show that the time performance of the algorithm is optimal in dense data sets.

关 键 词:频繁项集挖掘 包含因子 哈希存储 N—List 

分 类 号:TP301.06[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象