Spark平台下关联规则算法的优化实现  被引量:4

Optimization of association rules algorithm in Spark platform

在线阅读下载全文

作  者:梁瑷云 袁丁[1] 严清[1] 刘小久 LIANG Ai-yun;YUAN Ding;YAN Qing;LIU Xiao-jiu(School of Computer Science, Sichuan Normal University, Chengdu 610101, China)

机构地区:[1]四川师范大学计算机科学学院,四川成都610101

出  处:《计算机工程与设计》2018年第12期3692-3699,共8页Computer Engineering and Design

基  金:国家科技支撑计划课题基金项目(2014BAH11F01);国家自然科学基金项目(61373163);可视化计算与虚拟现实四川省重点实验室课题基金项目(PJ2012002)

摘  要:利用Spark平台的高速计算能力,将传统的关联规则算法移植到Spark平台上,虽然一定程度上提高了该算法的运行效率,但该算法本身存在的系统I/O负载量大、存储开销大等问题依然存在。为此,提出一种基于矩阵的并行化优化算法Apriori_MC_SP。引入矩阵概念减少事务数据库的扫描次数,充分利用Spark内存计算的弹性分布式内存数据集对象,存储事务布尔矩阵以及频繁项集。相较于传统的Apriori算法,该算法减少了事务数据库的访问次数,简化了Apriori算法的"自连接"以及"剪枝"过程。实验结果表明,提出方案在保证输出结果不变的情况下,加快了关联挖掘的执行效率。Using high-speed computing capability of Spark,operational efficiency of traditional Apriori algorithm has been improved when transplanting the algorithm to Spark platform.But this algorithm still has some issues,such as large system I/O load,high resource consumption,and large storage space.A parallel optimization algorithm named Apriori_MC_SP was proposed.Matrix concept was introduced to reduce the number of scans for transaction database.The resilient distributed dataset of Spark memory was fully used to store transaction Boolean matrix and frequent itemsets.Compared with the traditional Apriori algorithm,the optimized algorithm reduces the number of accesses to the transaction database and simplify the self-connection and pruning processes of Apriori algorithm.Experimental results show that the proposed scheme accelerates the efficiency of association mining while ensuring the same output.

关 键 词:Spark平台 APRIORI算法 并行化 布尔矩阵 弹性分布式内存数据集 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象