检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]四川师范大学计算机科学学院,成都610101 [2]四川师大科技园发展有限公司,成都610066
出 处:《计算机应用》2016年第12期3280-3284,3291,共6页journal of Computer Applications
基 金:国家科技支撑计划项目(2014BAH11F01;2014BAH11F02);四川省科技支撑计划项目(15GZ0079)~~
摘 要:由候选项集G2生成频繁2-项集岛是关联规则Apriori算法的一个瓶颈。直接哈希修剪(DHP)算法利用一个生成的Hash表见H2减G2中无用的候选项集,以此提高厶的生成效率。但传统DHP算法是一个串行算法,不能有效处理较大规模数据。针对这一问题,提出DHP的并行化算法——H_DHP。首先,对DHP算法并行化策略的可行性进行了理论分析与证明;其次,基于Hadoop平台,把Hash表以的生成以及频繁项集L1、L3~Lk的生成方法进行了并行实现,并借助Hbase数据库生成关联规则。仿真实验结果表明:与传统DHP算法相比,H_DHP算法在数据的处理时间效率、处理数据集的规模大小,以及加速比和可扩展性等方面都有较好的性能。It is a bottleneck of Apriori algorithm for mining association rules that the candidate set C2 is used to generate the frequent 2-item set L2. In the Direct Hashing and Pruning (DHP) algorithm, a generated Hash table H2 is used to delete the unused candidate item sets in C2 for improving the efficiency of generating L2. However, the traditional DI-IP is a serial algorithm, which cannot effectively deal with large scale data. In order to solve the problem, a DHP parallel algorithm, termed H DHP algorithm, was proposed. First, the feasibility of parallel strategy in DHP was analyzed and proved theoretically. Then, the generation method for the Hash table H2 and frequent item sets L1, L3 - Lk was developed in parallel based on Hadoop, and the association rules were generated by Hbase database. The simulation experimental results show that, compared with the DHP algorithm, the H_DHP algorithm has better performance in the processing efficiency of data, the size of the data set, the speedup and scalability.
关 键 词:HADOOP HASH表 APRIORI算法 直接哈希修剪算法
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15