基于MapReduce的关联规则并行增量更新算法  被引量:10

Parallel and incremental updating algorithm for association rules based on mapReduce

在线阅读下载全文

作  者:杨勇[1] 高松松[1] 

机构地区:[1]重庆邮电大学计算智能重庆市重点实验室,重庆400065

出  处:《重庆邮电大学学报(自然科学版)》2014年第5期670-678,共9页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基  金:重庆市自然科学基金(CSTC 2007BB2445);重庆市教委科学技术研究项目(KJ110522);重庆邮电大学科研基金(A2009-26)~~

摘  要:针对在关联规则的实际挖掘中,由数据快速增加所造成的大数据问题和增量更新问题。在快速更新频繁模式树算法(fast updated frequent pattern tree,FUFP-tree)的基础上,引入MapReduce并行编程模型,提出了一个面向大数据的并行的关联规则增量更新算法(parallel fast updated frequent pattern tree,PFUFP-tree)。该算法通过构建原始事务数据的分块索引,从而使得在每次增量更新时,能够最小化地扫描原始事务数据库,提高了挖掘效率;同时采用动态负载均衡的项目分组策略来优化并行计算过程中的项集分组问题,从而保证分布式集群中节点之间的负载均衡;实验结果证明,提出的算法是有效的和高效的,适用于动态增长的大数据环境。In the actual mining of association rules, aiming at the big data problem and incremental updating problem caused by the rapidly increasing of data, in this paper, a parallel incremental updating algorithm of association rules is proposed based on the MapReduce pm'allel programming model and the FUFP-tree algorithm. At first, the block index of the original transactions would be built. Based on the index, the number of scanning the original transaction database can be reduced. Therefore, the mining efficiency would be improved. Secondly, the grouping strategy of dynamic load-balancing is adopted to solve the item grouping problem in the process of parallel computing, so as to ensure the load-balancing between nodes of the distributed clusters. Finally, according to the compared experiment results, it is demonstrated that the proposed algorithm is effective and efficient, and can be used to incremental big data environment.

关 键 词:关联规则 大数据 增量更新 MAPREDUCE 快速更新频繁模式树(FUFP-tree) 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象