检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:敖孟飞 石鸿雁[1] Ao Mengfei;Shi Hongyan(School of Science,Shenyang University of Technology,Shenyang 110870,China)
出 处:《统计与决策》2022年第18期48-53,共6页Statistics & Decision
基 金:国家自然科学基金资助项目(61074005)。
摘 要:文章针对频繁项集挖掘中传统串行Eclat算法面对海量数据时挖掘效率不高的问题,提出一种海量数据下的并行频繁项集挖掘算法,即I-SPEclat算法。首先,对Eclat算法存在的缺陷进行改进,引入图的邻接矩阵作为数据的存储结构,避免了大量的交集运算;其次,利用先验性质对候选项集进行预剪枝和后剪枝,减少无用候选项集的数量,节约存储空间;再次,根据项集的前缀对数据进行划分,平衡每个计算节点的工作负载;最后,将改进的Eclat算法在Spark分布式计算框架上实现并行化。实验结果表明,I-SPEclat算法较已有的改进Eclat算法在时间消耗和内存消耗方面均有减少,且面对不同规模的数据集也有着良好的扩展性。Aiming at the problem that the traditional serial Eclat algorithm in frequent itemset mining is not efficient when faced with mass data,this paper proposes a parallel frequent itemset mining algorithm under massive data,that is,I-SPEclat algorithm.The algorithm first improves the defects of Eclat algorithm,and introduces the adjacency matrix of graph as the storage structure of data,which avoids a large number of intersection operations.Then,the paper uses a priori nature to pre-cut and post-cut the candidate set,reduces the number of useless candidate sets and saves storage space.After that,this paper divides the data according to the prefix of the itemset,and balances the workload of each computing node.Finally,the paper parallelizes the improved Eclat algorithm on the Spark distributed computing framework.The experimental results show that the I-SPEclat algorithm is less time-consuming and memory-consuming than the existing improved Eclat algorithm,and also very scalable in the face of data sets with different sizes.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.27.20