检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:石陆魁[1] 张欣[1] 师胜利[2] SHI Lukui;ZHANG Xin;SHI Shengli(School of Computer Science and Software,Hebei University of Technology,Tianjin 300401,China;School of Information Technology,Hebei Normal University,Shijiazhuang 050024,China)
机构地区:[1]河北工业大学计算机科学与软件学院,天津300401 [2]河北师范大学信息技术学院,石家庄050024
出 处:《计算机工程与应用》2018年第13期52-58,110,共8页Computer Engineering and Applications
基 金:河北省自然科学基金(No.F2016202144;No.F2017202145)
摘 要:PFP_Growth算法是FP_Growth算法在Hadoop平台上基于MapReduce的并行化,该算法在分组过程中没有考虑负载均衡问题,导致各个节点完成任务时间不一致,甚至相差很大,从而降低了算法的执行效率。为了提高算法的执行效率,提出了一种基于Spark的RPFP算法,该算法对PFP_Growth算法在均衡分组和降低时间复杂度两方面进行优化,通过把负载大的项放在负载总和最小的组里面实现均衡分组,通过在链头表结构中加入一张哈希表达到快速访问元素地址的目的,从而降低时间复杂度。实验结果表明,RPFP通过优化PFP算法,有效提高了频繁项集的挖掘效率。PFP_Growth algorithm is the parallelization of FP_Growth algorithm on the Hadoop platform based on MapReduce. The algorithm does not consider the balance of the load while grouping the transaction set, which causes the time inconsistency of different nodes to accomplish the tasks and even a bigger difference. Thus, it reduces the efficiency of the algorithm. To improve the efficiency of the algorithm, this paper proposes a Spark-based RPFP algorithm, which optimizes PFP_Growth algorithm through balancing the groups and reducing the time complexity. To balance the group,the large load is placed into the group with the smallest total load. The address of the element is fast accessed by adding a Hash table to the head table, which reduces the time complexity. Experimental results show that RPFP algorithm effectively improves the mining efficiency of the frequent itemsets.
关 键 词:FP_GROWTH算法 频繁项集挖掘 负载均衡 链头表结构 SPARK
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.115.135