MapReduce编程模型下的约束频繁模式挖掘算法  被引量:2

An Algorithm of Mining Constrained Frequent Patterns Based on MapReduce

在线阅读下载全文

作  者:闫晓妩 张继福[1] 荀亚玲[1] 赵旭俊[1] 

机构地区:[1]太原科技大学计算机科学与技术学院,太原030024

出  处:《小型微型计算机系统》2015年第10期2257-2261,共5页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61272263)资助;山西省青年基金项目(20120210154)资助;太原科技大学研究生科技创新项目(20134027)资助

摘  要:约束频繁模式是利用用户给定的约束条件,生成的一种频繁模式,具有针对性强、挖掘效率高等特点.随着数据量的增大,约束频繁模式生成过程存在着占用内存大和I/O代价高等问题,难以适用于海量高维数据集.采用MapReduce编程模型,给出一种约束频繁模式并行挖掘MCFP算法.该算法首先,采用三对Map和Reduce函数实现了将数据中事务映射为频繁项计数、构建约束频繁模式树和生成约束频繁模式,以及频繁模式聚合等主要步骤;其次,根据频繁项支持度,迁移数据记录,有效地实现了频繁模式生成过程中的负载均衡;最后,采用天体光谱数据,实验验证了该算法的有效性、可伸缩性和可扩展性.Constrained frequent pattern is one of frequent patterns using the constraints given by the users and has the advantages of high targeting and high mining efficiency, etc. However with the increasing of the amount of data, there appear the problems of large amount of data in memory and the high cost of I/O in the process of generating the constrained frequent pattern tree which is difficult to fit in the massive and high-dimension data set. In this paper, a parallel mining algorithm called MCFP based on constrained frequent pattern Tree (CFP-Tree ) is presented adopting the MapReduce programming model. Firstly, the algorithm can be divided into three pairs of Map and Reduce functions to achieve 3 tasks including computing the frequent items support counting, constructing of constrained frequent pattern tree ( CFP-Tree ), mining frequent patterns and the aggregation of frequent patterns. Secondly, in order to improve the effectiveness of generating frequent patterns, the data records are reallocated according to the frequent items support. In the end, the experiment results validate the efficiency, scalability and extensibility of the algorithm by adopting the star spectra data.

关 键 词:约束频繁模式 MapReduce编程模型 CFP—Tree 支持度 负载均衡 

分 类 号:TP[自动化与计算机技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象