检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李成严[1] 李鑫宇 张磊[1] 王广泽 LI Chengyan;LI Xinyu;ZHANG Lei;WANG Guangze(School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China;Harbin University of Science and Technology Library,Harbin University of Science and Technology,Harbin,150080,China)
机构地区:[1]哈尔滨理工大学计算机科学与技术学院,哈尔滨150080 [2]哈尔滨理工大学图书馆,哈尔滨150080
出 处:《哈尔滨理工大学学报》2023年第6期111-120,共10页Journal of Harbin University of Science and Technology
基 金:黑龙江省自然科学基金(LH2021F032).
摘 要:关联规则挖掘主要用于发现隐藏在数据中的知识。加权关联规则挖掘能更有效地挖掘出项目重要性不同的规则。针对人工赋权的方法存在一定的主观随意性,没有充分利用数据本身特征且串行算法无法处理大数据集的问题。提出了独立概率完全加权关联规则的并行挖掘算法,该算法以项在当前数据集中出现概率为依据进行完全加权模型构建,以挖掘出更多用户所期待的关联规则。采用前缀划分、位图存储等技术分别解决加权频繁项集筛选、候选加权频繁项集生成所造成时间代价高的问题。引入分布式并行计算思想,并在Spark框架下编程实现,使算法可以在大数据环境下对加权关联规则进行高效挖掘。利用数值实例对该模型和算法进行了验证,结果表明此算法可在保证算法时间效率优越的同时获得更多隐藏信息。Association rule mining is mainly used to discover knowledge hidden in the data.Weighted association rule mining can mine rules with different importance of the project more effectively.There is a certain subjective arbitrariness in the method of artificial weight assignment,existing the problem of not fully utilizing the characteristics of the data itself and the serial algorithm cannot deal with large data sets.A parallel mining algorithm for independent probability fully weighted association rules is proposed.The algorithm constructs a fully weighted model based on the probability of the item appearing in the current data set.In order to mine the association rules expected by more users.Technologies like prefix partition and bitmap storage are used to solve the problem of high time cost caused by weighted frequent item-sets filtering and candidate weighted frequent item-sets generation respectively.The idea of distributed parallel computing is introduced and implemented in Spark framework,which enables the algorithm to efficiently mine weighted association rules in big data environment.Numerical examples are used to verify the proposed model and algorithm.The results show that the proposed algorithm can obtain more effective hidden information and has higher time efficiency.
分 类 号:TP399[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.20.233.31