检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李洁 朱洪亮[1,2] 陈玉玲 辛阳[1,2] LI Jie;ZHU Hongliang;CHEN Yuling;XIN Yang(School of Cyberspace Security,Beijing University of Posts and Telecommunications,Beijing 100876,China;Guizhou Provincial Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China)
机构地区:[1]北京邮电大学网络空间安全学院,北京100876 [2]贵州大学贵州省公共大数据重点实验室,贵阳550025
出 处:《计算机工程》2020年第11期109-116,共8页Computer Engineering
基 金:国家重点研发计划(2017YFB0802300);贵州省科技重大专项(20183001);贵州省公共大数据重点实验室开放课题(2018BDKFJJ008,2018BDKFJJ020)。
摘 要:Apriori算法能够挖掘事物之间的关联关系,但传统Apriori算法每计算一次候选集的支持度,都需要遍历原始事务数据库,多次扫描数据库导致其效率较低。为此,提出一种基于哈希存储与事务加权的改进算法。通过哈希存储的去重特性对事务进行去重,以减少冗余计算。将项目与项集的映射存储到哈希结构中,避免计算候选集的支持度时多次扫描事务数据库。同时开启多个线程,并行计算候选集的支持度,从而提高Apriori算法的运行效率。在开源数据集上的实验结果表明,当数据集中事务条数以及重复事务数越多时,该算法相较于传统Apriori算法的性能提升越明显,其运行时间与FP-Growth算法相近但避免了FP-Growth算法内存占用过大的问题。The Apriori algorithm can mine the association relationships between things,but the traditional Apriori algorithm needs to traverse the original transaction database every time the support of the candidate set is calculated,which reduces the efficiency of the algorithm.To address the problem,this paper proposes an improved algorithm based on hash storage and transaction weighting.The algorithm uses the deduplication feature of hash storage to deduplicate the transactions to reduce redundant calculations.At the same time,the mapping between the transaction set and the itemset is stored in the hash structure to avoid scanning the transaction database for multiple times during the calculation of the support of the candidate set.In addition,the support of the candidate set is calculated in parallel using multiple threads to improve the efficiency of the Apriori algorithm.Experimental results on open-source datasets show that the performance improvement of the proposed algorithm over the traditional Apriori algorithm increases with the number of transactions and repeated transactions in the dataset.Its running time is similar to that of the FP-Growth algorithm while the excessive memory consumption is avoided.
关 键 词:关联规则 频繁项集 哈希存储 事务加权 并行计算
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229