基于Spark的并行FP-Growth算法优化及实现被引量：9

Optimization and implementation of parallel FP-Growth algorithm based on Spark

作　　者：顾军华武君艳[2] 许馨匀谢志坚张素琪 GU Junhua;WU Junyan;XU Xinyun;XIE Zhijian;ZHANG Suqi(School of Artificial Intelligence and Data Science,Hebei University of Technology,Tianjin 300401,China;Hebei Province Key Laboratory of Big Data Computing(Hebei University of Technology),Tianjin 300401,China;School of Information Engineering,Tianjin University of Commerce,Tianjin 300134,China)

机构地区：[1]河北工业大学人工智能与数据科学学院,天津300401 [2]河北省大数据计算重点实验室(河北工业大学),天津300401 [3]天津商业大学信息工程学院,天津300134

出　　处：《计算机应用》2018年第11期3069-3074,共6页journal of Computer Applications

基　　金：河北省科技计划项目(17210305D);天津市科技计划项目(16ZXHLSF0023);天津市科技计划项目(15ZXHLGX00130);天津市自然科学基金资助项目(15JCQNJC00600)~~

摘　　要：为了进一步提高在Spark平台上的频繁模式增长(FP-Growth)算法执行效率,提出一种新的基于Spark的并行FP-Growth算法——BFPG。首先,从频繁模式树(FP-Tree)规模大小和分区计算量对F-List分组策略进行改进,保证每个分区负载总和近似相等;然后,通过创建列表P-List对数据集划分策略进行优化,减少遍历次数,降低时间复杂度。实验结果表明,BFPG算法提高了并行FP-Growth算法挖掘效率,且算法具有良好的扩展性。In order to further improve the execution efficiency of Frequent Pattern-Growth(FP-Growth)algorithm on Spark platform,a new parallel FP-Growth algorithm based on Spark,named BFPG(Better Frequent Pattern-Growth),was presented.Firstly,the grouping strategy F-List was improved in the size of the Frequent Pattern-Tree(FP-Tree)and the amount of partition calculation to ensure that the load sum of each partition was approximately equal.Secondly,the data set partitioning strategy was optimized by creating a list P-List,and then the time complexity was reduced by reducing the traversal times.The experimental results show that the BFPG algorithm improves the mining efficiency of the parallel FP-Growth algorithm,and the algorithm has good scalability.

关键词：大数据平台关联规则频繁项集频繁模式增长算法 SPARK

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Spark的并行FP-Growth算法优化及实现被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Spark的并行FP-Growth算法优化及实现 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Spark的并行FP-Growth算法优化及实现被引量：9