改进的并行关联规则增量挖掘算法被引量：7

Improved parallel association rules incremental mining algorithm

作　　者：毛伊敏[1] 邓千虎邓小鸿[2] 刘蔚[2] Mao Yimin;Deng Qianhu;Deng Xiaohong;Liu Wei(School of Information Engineering,Jiangxi University of Science&Technology,Ganzhou Jiangxi 341000,China;College of Applied Science,Jiangxi University of Science&Technology,Ganzhou Jiangxi 341000,China)

机构地区：[1]江西理工大学信息工程学院,江西赣州341000 [2]江西理工大学应用科学学院,江西赣州341000

出　　处：《计算机应用研究》2021年第10期2974-2980,共7页Application Research of Computers

基　　金：国家重点研发计划资助项目(2018YFC1504705);国家自然科学基金资助项目(41562019,61762046);江西省教育厅科技资助项目(GJJ209407)。

摘　　要：针对大数据环境下基于Can树(canonical order tree)的增量关联规则算法存在树结构空间占用过大、频繁模式挖掘效率不佳以及MapReduce集群并行化性能不足等问题,提出了一种基于粗糙集和归并剪枝方法改进的并行关联规则增量挖掘算法MR-PARIRM(MapReduce-based parallel association rules incremental mining algorithm using rough set and merge pruning)。首先,设计了一种基于粗糙集的相似项合并策略RS-SIM(rough set based similar item merge)对数据集的相似项进行合并处理,并根据合并后的数据进行Can树构造,从而降低树结构的空间占用;其次,提出了一种归并剪枝策略MPS(merge pruning strategy)对树结构中的传播路径进行修剪合并,通过压缩频繁模式搜索空间来加快频繁项挖掘;最后,通过动态调度策略DSS(dynamic scheduling strategy)对异构式MapReduce集群中的计算任务进行动态调度,实现了负载均衡,有效提升了集群的并行化运算能力。最终的实验仿真结果表明,MR-PARIRM在大数据环境下具有相对较好的性能表现,适用于对大规模数据进行并行化处理。In the big data environment,the Can-tree based on incremental association rule algorithm has problems such as too much space occupation of the tree structure,the efficiency of frequent pattern mining is poor,and the parallelization perfor-mance of MapReduce cluster is insufficient.Aiming at these problems,this paper proposed the MR-PARIRM.Firstly,it designed a RS-SIM to merge similar items in the dataset,and constructed Can-tree based on the merged data,thereby reducing the space occupation of the tree structure.Secondly,this paper proposed an MPS to prune and merge the propagation paths in the tree structure,thereby compressing the frequent pattern search space to speed up frequent item mining.Finally,MR-PARIRM used the DSS to dynamically schedule the computing tasks in the heterogeneous MapReduce cluster,thereby implementing the load balance and effectively improving the parallel computing capabilities of the cluster.The final experimental simulation results show that MR-PARIRM has relatively better performance in the big data environment and is suitable for parallel proces-sing of large-scale data.

关键词：Can树粗糙集归并剪枝大数据增量挖掘

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进的并行关联规则增量挖掘算法被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进的并行关联规则增量挖掘算法 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

改进的并行关联规则增量挖掘算法被引量：7