检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]国防科学技术大学计算机学院
出 处:《计算机应用研究》2017年第8期2274-2277,共4页Application Research of Computers
基 金:国家"863"计划资助项目(2014AA01A302)
摘 要:Apriori算法是关联规则挖掘中最经典的算法之一,其核心问题是频繁项集的获取。针对经典Apriori算法存在的需多次遍历事务数据库及需产生候选项集等问题,首先通过转换存储结构、消除候选集产生过程等方法对Apriori算法进行优化;同时,随着大数据时代的到来,数据量与日俱增,传统算法面临巨大挑战,将优化的Apriori与Spark相结合,充分利用Spark的内存计算、弹性分布式数据集等优势,提出了IABS(improved Apriori algorithm based on Spark)。通过与已有的同类算法进行比较,IABS的数据可扩展性和节点可扩展性得以验证,并且在多种数据集上平均获得了23.88%的性能提升,尤其随着数据量的增长,性能提升更加明显。Apriori algorithm is one of the most classical algorithm in association rule mining, the core problem is the genera- tion process of frequent itemsets. Firstly, aimed at the existing problems of classical Apriori algorithm, such as it needed to scan the transaction global database for several times and needed to generate candidate itemsets, this paper optimized it by transforming storage structure and eliminating the process of candidate itemsets generation. Then, with the advent of the era of big data, data volume rises with the day, classical Apriori algorithm faces severe challenge. Based on the improved Apriori al- gorithm and combined with Spark platform, this paper proposed the IABS algorithm, which made full use of Spark, such as in- memory computation, resilient distributed datasets. Compared with already existing similar algorithms, the sizeup and node salability of IABS are validated, as well as, IABS achieves 23.88% performance improvement in average for various bench- marks. Especially, as the growth of data, its performance improvement is more obvious.
关 键 词:APRIORI算法 频繁项集 存储结构转换 SPARK 内存计算
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3