海量数据下基于Hadoop的分布式FP-Growth算法被引量：4

Distributed FP-Growth algorithm based on Hadoop under massive data

作　　者：朱颢东[1] 薛校博李红婵[1] 孟颍辉 ZHU Haodong;XUE Xiaobo;LI Hongchan;MENG Yinghui(School of Computer and Communication Engineering,Zhengzhou University of Light Industu,Zhengzhou 450001,China)

机构地区：[1]郑州轻工业学院计算机与通信工程学院,河南郑州450001

出　　处：《轻工学报》2018年第5期97-102,108,共7页Journal of Light Industry

基　　金：国家自然科学基金项目(61501405);河南省科技计划项目(152102210149;152102210357);郑州轻工业学院校级青年骨干教师培养对象资助计划项目(XGGJS02);郑州轻工业学院研究生科技创新基金资助项目

摘　　要：针对大数据环境下的关联挖掘问题,采取两次扫描数据库,将事务添加到相互独立的数据分区的方式,对传统FP-Growth算法进行分布式改造,进而提出了基于Hadoop框架的分布式FP-Growth算法以实现海量数据的频繁模式FP挖掘.仿真结果表明,在数据处理量逐渐增大的过程中,该算法相比较传统算法其运行时间和内存消耗的优势愈加明显,当数据处理量达到70万条时,该算法比传统算法节省约2/3的运行时间,而内存消耗仅为传统算法的1/5.说明该算法在处理海量数据时,能够显著提高FP的挖掘效率并降低内存的消耗量.In view of the large data problem of association mining by the method of taking two times of scanning database and adding the transaction to the independent data partition, distributed renovation of traditional FP- Growth algorithm was taken, the distributed FP-Growth algorithm based on Hadoop framework was then put forward so as to realize the frequent pattern FP huge amounts of data mining. The simulation results showed that in the process of increasing data processing, the algorithm was compared with the traditional algorithm advantages of its running time and memoPy consumption were becoming ever more obvious. When the amount of data processing reached 700,000 items, the algorithm saved about 2/3 running time than the traditional algorithm, while the memory consumption was only 1/5 of the traditional algorithm. It showed that the algorithm could significantly improve the mining efficiency of FP and reduced the memory consumption when dealing with massive data

关键词：FP-GROWTH算法 HADOOP 数据分区分布式计算

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

海量数据下基于Hadoop的分布式FP-Growth算法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

海量数据下基于Hadoop的分布式FP-Growth算法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

海量数据下基于Hadoop的分布式FP-Growth算法被引量：4