检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]太原科技大学计算机科学与技术学院,太原030024
出 处:《小型微型计算机系统》2015年第10期2257-2261,共5页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61272263)资助;山西省青年基金项目(20120210154)资助;太原科技大学研究生科技创新项目(20134027)资助
摘 要:约束频繁模式是利用用户给定的约束条件,生成的一种频繁模式,具有针对性强、挖掘效率高等特点.随着数据量的增大,约束频繁模式生成过程存在着占用内存大和I/O代价高等问题,难以适用于海量高维数据集.采用MapReduce编程模型,给出一种约束频繁模式并行挖掘MCFP算法.该算法首先,采用三对Map和Reduce函数实现了将数据中事务映射为频繁项计数、构建约束频繁模式树和生成约束频繁模式,以及频繁模式聚合等主要步骤;其次,根据频繁项支持度,迁移数据记录,有效地实现了频繁模式生成过程中的负载均衡;最后,采用天体光谱数据,实验验证了该算法的有效性、可伸缩性和可扩展性.Constrained frequent pattern is one of frequent patterns using the constraints given by the users and has the advantages of high targeting and high mining efficiency, etc. However with the increasing of the amount of data, there appear the problems of large amount of data in memory and the high cost of I/O in the process of generating the constrained frequent pattern tree which is difficult to fit in the massive and high-dimension data set. In this paper, a parallel mining algorithm called MCFP based on constrained frequent pattern Tree (CFP-Tree ) is presented adopting the MapReduce programming model. Firstly, the algorithm can be divided into three pairs of Map and Reduce functions to achieve 3 tasks including computing the frequent items support counting, constructing of constrained frequent pattern tree ( CFP-Tree ), mining frequent patterns and the aggregation of frequent patterns. Secondly, in order to improve the effectiveness of generating frequent patterns, the data records are reallocated according to the frequent items support. In the end, the experiment results validate the efficiency, scalability and extensibility of the algorithm by adopting the star spectra data.
关 键 词:约束频繁模式 MapReduce编程模型 CFP—Tree 支持度 负载均衡
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.138.21