检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:毛伊敏[1] 邓千虎 邓小鸿[2] 刘蔚[2] Mao Yimin;Deng Qianhu;Deng Xiaohong;Liu Wei(School of Information Engineering,Jiangxi University of Science&Technology,Ganzhou Jiangxi 341000,China;College of Applied Science,Jiangxi University of Science&Technology,Ganzhou Jiangxi 341000,China)
机构地区:[1]江西理工大学信息工程学院,江西赣州341000 [2]江西理工大学应用科学学院,江西赣州341000
出 处:《计算机应用研究》2021年第10期2974-2980,共7页Application Research of Computers
基 金:国家重点研发计划资助项目(2018YFC1504705);国家自然科学基金资助项目(41562019,61762046);江西省教育厅科技资助项目(GJJ209407)。
摘 要:针对大数据环境下基于Can树(canonical order tree)的增量关联规则算法存在树结构空间占用过大、频繁模式挖掘效率不佳以及MapReduce集群并行化性能不足等问题,提出了一种基于粗糙集和归并剪枝方法改进的并行关联规则增量挖掘算法MR-PARIRM(MapReduce-based parallel association rules incremental mining algorithm using rough set and merge pruning)。首先,设计了一种基于粗糙集的相似项合并策略RS-SIM(rough set based similar item merge)对数据集的相似项进行合并处理,并根据合并后的数据进行Can树构造,从而降低树结构的空间占用;其次,提出了一种归并剪枝策略MPS(merge pruning strategy)对树结构中的传播路径进行修剪合并,通过压缩频繁模式搜索空间来加快频繁项挖掘;最后,通过动态调度策略DSS(dynamic scheduling strategy)对异构式MapReduce集群中的计算任务进行动态调度,实现了负载均衡,有效提升了集群的并行化运算能力。最终的实验仿真结果表明,MR-PARIRM在大数据环境下具有相对较好的性能表现,适用于对大规模数据进行并行化处理。In the big data environment,the Can-tree based on incremental association rule algorithm has problems such as too much space occupation of the tree structure,the efficiency of frequent pattern mining is poor,and the parallelization perfor-mance of MapReduce cluster is insufficient.Aiming at these problems,this paper proposed the MR-PARIRM.Firstly,it designed a RS-SIM to merge similar items in the dataset,and constructed Can-tree based on the merged data,thereby reducing the space occupation of the tree structure.Secondly,this paper proposed an MPS to prune and merge the propagation paths in the tree structure,thereby compressing the frequent pattern search space to speed up frequent item mining.Finally,MR-PARIRM used the DSS to dynamically schedule the computing tasks in the heterogeneous MapReduce cluster,thereby implementing the load balance and effectively improving the parallel computing capabilities of the cluster.The final experimental simulation results show that MR-PARIRM has relatively better performance in the big data environment and is suitable for parallel proces-sing of large-scale data.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.217.26