I-Apriori:一种基于Spark平台的改进Apriori算法被引量：8

I-Apriori: An Improved Apriori Algorithm Based on Spark Platform

出　　处：《科学技术与工程》2017年第27期243-248,共6页Science Technology and Engineering

基　　金：国家自然科学基金(61402529)资助

摘　　要：针对Apriori算法在第二次迭代过程中产生大量候选集的弊端,在Spark大数据框架下,将Apriori算法进行并行化处理。提出一种基于Spark平台的改进Apriori算法——I-Apriori;该算法利用Spark基于内存计算的抽象对象(RDD)存储频繁项集,在第二次迭代中,通过使用改进的布隆过滤器存储频繁1项集,消除候选集生成,减少数据库扫描次数,提高算法效率。实验结果表明,相比基于Spark平台的Apriori算法进行性能评估,I-Apriori算法具有更优的性能,能够较大程度地提高大数据关联规则挖掘的效率。In view of the Apriori algorithm＇s second iteration producing a large number of candidate sets,in the big data framework of the Spark,making the Apriori algorithm for parallel processing,an improved Apriori algorithm was put forward based on Spark platform-I-Apriori. The algorithm uses the Spark of abstract objects（ RDD） based on memory to storage frequent itemsets,in the second iteration,storaging singleton frequent items by using improved bloom filter,eliminating the candidate set generation,reducing the database scan times,improving the efficiency of the algorithm. The experimental results show that comparing with Apriori algorithm performance evaluation based on platform of the Spark,I-Apriori algorithm has better performance,and can greatly improve the efficiency of data mining association rules.

关键词：内存计算框架数据挖掘关联规则算法布隆过滤器

分类号：TP311.1[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

I-Apriori:一种基于Spark平台的改进Apriori算法被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

I-Apriori:一种基于Spark平台的改进Apriori算法 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

I-Apriori:一种基于Spark平台的改进Apriori算法被引量：8