基于Spark的并行化高效用项集挖掘算法  被引量:6

A parallelhigh utility itemset mining algorithm based on Spark

在线阅读下载全文

作  者:何登平[1,2,3] 何宗浩 李培强 HE Deng-ping;HE Zong-hao;LI Pei-qiang(School of Telecommunication and Information Engineering,Chongqing University of Posts and Telecommunications Chongqing 400065;Research Center of New Telecommunication Technology Applications,Chongqing University of Posts and Telecommunications,Chongqing 400065;Chongqing Information Technology Designing Company Limited,Chongqing 400021,China)

机构地区:[1]重庆邮电大学通信与信息工程学院,重庆400065 [2]重庆邮电大学通信新技术应用研究中心,重庆400065 [3]重庆信科设计有限公司,重庆400021

出  处:《计算机工程与科学》2019年第10期1723-1730,共8页Computer Engineering & Science

摘  要:针对传统基于链表结构的Top-K高效用挖掘算法在大数据环境下不能满足挖掘需求的问题,提出一种基于Spark的并行化高效用项集挖掘算法(STKO)。首先从阈值提升、搜索空间缩小等方面对TKO算法进行改进;然后选择Spark平台,改变原有数据存储结构,利用广播变量优化迭代过程,在避免大量重新计算的同时使用负载均衡思想实现Top-K高效用项集的并行挖掘。实验结果表明,该并行算法能有效地挖掘出大数据集中的高效用项集。Aiming at the problem that the traditional Top-K high utility mining algorithms based linked list structure can not meet the mining requirements in the big data environment,a parallel high utility itemset mining algorithm based on Spark(STKO)is proposed.Firstly,the TKO algorithm is improved by increasing the threshold increase and reducing the search space.Then,based on the Spark platform,the original data storage structure is changed and broadcast variables are used to optimize the iterative process,so as to avoid a large number of recalculations and use the load balancing idea to realize parallel mining of Top-K high utility itemsets.The experimental results show that the proposed algorithm can effectively mine the high utility item sets in the big data sets.

关 键 词:数据挖掘 高效用项集 Spark大数据框架 并行化 TOP-K 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象