海量数据下的并行频繁项集挖掘算法被引量：4

Parallel Frequent Itemset Mining Algorithm for Massive Data

作　　者：敖孟飞石鸿雁[1] Ao Mengfei;Shi Hongyan(School of Science,Shenyang University of Technology,Shenyang 110870,China)

出　　处：《统计与决策》2022年第18期48-53,共6页Statistics & Decision

基　　金：国家自然科学基金资助项目(61074005)。

摘　　要：文章针对频繁项集挖掘中传统串行Eclat算法面对海量数据时挖掘效率不高的问题,提出一种海量数据下的并行频繁项集挖掘算法,即I-SPEclat算法。首先,对Eclat算法存在的缺陷进行改进,引入图的邻接矩阵作为数据的存储结构,避免了大量的交集运算;其次,利用先验性质对候选项集进行预剪枝和后剪枝,减少无用候选项集的数量,节约存储空间;再次,根据项集的前缀对数据进行划分,平衡每个计算节点的工作负载;最后,将改进的Eclat算法在Spark分布式计算框架上实现并行化。实验结果表明,I-SPEclat算法较已有的改进Eclat算法在时间消耗和内存消耗方面均有减少,且面对不同规模的数据集也有着良好的扩展性。Aiming at the problem that the traditional serial Eclat algorithm in frequent itemset mining is not efficient when faced with mass data,this paper proposes a parallel frequent itemset mining algorithm under massive data,that is,I-SPEclat algorithm.The algorithm first improves the defects of Eclat algorithm,and introduces the adjacency matrix of graph as the storage structure of data,which avoids a large number of intersection operations.Then,the paper uses a priori nature to pre-cut and post-cut the candidate set,reduces the number of useless candidate sets and saves storage space.After that,this paper divides the data according to the prefix of the itemset,and balances the workload of each computing node.Finally,the paper parallelizes the improved Eclat algorithm on the Spark distributed computing framework.The experimental results show that the I-SPEclat algorithm is less time-consuming and memory-consuming than the existing improved Eclat algorithm,and also very scalable in the face of data sets with different sizes.

关键词：Eclat算法 Spark框架邻接矩阵剪枝优化

分类号：TP181[自动化与计算机技术—控制理论与控制工程] TP301.6[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

海量数据下的并行频繁项集挖掘算法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

海量数据下的并行频繁项集挖掘算法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

海量数据下的并行频繁项集挖掘算法被引量：4