IABS:一个基于Spark的Apriori改进算法  被引量:12

IABS:parallel improved Apriori algorithm based on Spark

在线阅读下载全文

作  者:闫梦洁 罗军[1] 刘建英[1] 侯传旺 

机构地区:[1]国防科学技术大学计算机学院

出  处:《计算机应用研究》2017年第8期2274-2277,共4页Application Research of Computers

基  金:国家"863"计划资助项目(2014AA01A302)

摘  要:Apriori算法是关联规则挖掘中最经典的算法之一,其核心问题是频繁项集的获取。针对经典Apriori算法存在的需多次遍历事务数据库及需产生候选项集等问题,首先通过转换存储结构、消除候选集产生过程等方法对Apriori算法进行优化;同时,随着大数据时代的到来,数据量与日俱增,传统算法面临巨大挑战,将优化的Apriori与Spark相结合,充分利用Spark的内存计算、弹性分布式数据集等优势,提出了IABS(improved Apriori algorithm based on Spark)。通过与已有的同类算法进行比较,IABS的数据可扩展性和节点可扩展性得以验证,并且在多种数据集上平均获得了23.88%的性能提升,尤其随着数据量的增长,性能提升更加明显。Apriori algorithm is one of the most classical algorithm in association rule mining, the core problem is the genera- tion process of frequent itemsets. Firstly, aimed at the existing problems of classical Apriori algorithm, such as it needed to scan the transaction global database for several times and needed to generate candidate itemsets, this paper optimized it by transforming storage structure and eliminating the process of candidate itemsets generation. Then, with the advent of the era of big data, data volume rises with the day, classical Apriori algorithm faces severe challenge. Based on the improved Apriori al- gorithm and combined with Spark platform, this paper proposed the IABS algorithm, which made full use of Spark, such as in- memory computation, resilient distributed datasets. Compared with already existing similar algorithms, the sizeup and node salability of IABS are validated, as well as, IABS achieves 23.88% performance improvement in average for various bench- marks. Especially, as the growth of data, its performance improvement is more obvious.

关 键 词:APRIORI算法 频繁项集 存储结构转换 SPARK 内存计算 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象