基于Spark的并行FP-Growth算法优化及实现  被引量:9

Optimization and implementation of parallel FP-Growth algorithm based on Spark

在线阅读下载全文

作  者:顾军华 武君艳[2] 许馨匀 谢志坚 张素琪 GU Junhua;WU Junyan;XU Xinyun;XIE Zhijian;ZHANG Suqi(School of Artificial Intelligence and Data Science,Hebei University of Technology,Tianjin 300401,China;Hebei Province Key Laboratory of Big Data Computing(Hebei University of Technology),Tianjin 300401,China;School of Information Engineering,Tianjin University of Commerce,Tianjin 300134,China)

机构地区:[1]河北工业大学人工智能与数据科学学院,天津300401 [2]河北省大数据计算重点实验室(河北工业大学),天津300401 [3]天津商业大学信息工程学院,天津300134

出  处:《计算机应用》2018年第11期3069-3074,共6页journal of Computer Applications

基  金:河北省科技计划项目(17210305D);天津市科技计划项目(16ZXHLSF0023);天津市科技计划项目(15ZXHLGX00130);天津市自然科学基金资助项目(15JCQNJC00600)~~

摘  要:为了进一步提高在Spark平台上的频繁模式增长(FP-Growth)算法执行效率,提出一种新的基于Spark的并行FP-Growth算法——BFPG。首先,从频繁模式树(FP-Tree)规模大小和分区计算量对F-List分组策略进行改进,保证每个分区负载总和近似相等;然后,通过创建列表P-List对数据集划分策略进行优化,减少遍历次数,降低时间复杂度。实验结果表明,BFPG算法提高了并行FP-Growth算法挖掘效率,且算法具有良好的扩展性。In order to further improve the execution efficiency of Frequent Pattern-Growth(FP-Growth)algorithm on Spark platform,a new parallel FP-Growth algorithm based on Spark,named BFPG(Better Frequent Pattern-Growth),was presented.Firstly,the grouping strategy F-List was improved in the size of the Frequent Pattern-Tree(FP-Tree)and the amount of partition calculation to ensure that the load sum of each partition was approximately equal.Secondly,the data set partitioning strategy was optimized by creating a list P-List,and then the time complexity was reduced by reducing the traversal times.The experimental results show that the BFPG algorithm improves the mining efficiency of the parallel FP-Growth algorithm,and the algorithm has good scalability.

关 键 词:大数据平台 关联规则 频繁项集 频繁模式增长算法 SPARK 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象