检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郑静益 邓晓衡[1] Zheng Jingyi;Deng Xiaoheng(College of Software, Central South University, Changsha 410075, China)
机构地区:[1]中南大学软件学院,长沙410075
出 处:《计算机应用研究》2019年第4期1059-1063,1067,共6页Application Research of Computers
基 金:中南大学研究生科研创新项目(2017zzts612)
摘 要:Apriori算法是解决频繁项集挖掘最常用的算法之一,但多轮迭代扫描完整数据集的计算方式,严重影响算法效率且难以并行化处理。随着数据规模的持续增大,这一问题日益严重。针对这一问题,提出了一种基于项编码和Spark计算框架的Apriori并行化处理方法——IEBDA算法,利用项编码完整保存项集信息,在不重复扫描完整数据集的情况下完成频繁项集挖掘,同时利用Spark的广播变量实现并行化处理。与其他分布式Apriori算法在不同规模的数据集上进行性能比较,发现IEBDA算法从第一轮迭代后加速效果明显。结果表明,该算法可以提高大数据环境下多轮迭代的频繁项集挖掘效率。Apriori is one of the most widely used algorithm to discover frequent patterns.However,scanning the entire dataset in each iteration makes this algorithm inefficient and hard to be in parallel.With the size of datasets gets larger continuously,this problem is becoming more and more serious.Therefore,this paper proposed a novel algorithm called IEBDA.This algorithm was a kind of parallelization of Apriori based on item encoding and Spark framework.Saving information of each itemset by item encoding so that it could finish frequent itemset mining without scanning the whole dataset repeatedly.The broadcast variables of Spark enabled this algorithm to be in parallel.Compared with other distributed Apriori algorithms on datasets with different sizes,the acceleration of mining after the first iteration was obvious.The results show that this algorithm efficiently improves the multi-iteratively frequent itemset mining in big data environment.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229