独立概率完全加权关联规则的并行挖掘算法  

Parallel Independent Probability Fully Weighted Association Rule Mining Algorithm

在线阅读下载全文

作  者:李成严[1] 李鑫宇 张磊[1] 王广泽 LI Chengyan;LI Xinyu;ZHANG Lei;WANG Guangze(School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China;Harbin University of Science and Technology Library,Harbin University of Science and Technology,Harbin,150080,China)

机构地区:[1]哈尔滨理工大学计算机科学与技术学院,哈尔滨150080 [2]哈尔滨理工大学图书馆,哈尔滨150080

出  处:《哈尔滨理工大学学报》2023年第6期111-120,共10页Journal of Harbin University of Science and Technology

基  金:黑龙江省自然科学基金(LH2021F032).

摘  要:关联规则挖掘主要用于发现隐藏在数据中的知识。加权关联规则挖掘能更有效地挖掘出项目重要性不同的规则。针对人工赋权的方法存在一定的主观随意性,没有充分利用数据本身特征且串行算法无法处理大数据集的问题。提出了独立概率完全加权关联规则的并行挖掘算法,该算法以项在当前数据集中出现概率为依据进行完全加权模型构建,以挖掘出更多用户所期待的关联规则。采用前缀划分、位图存储等技术分别解决加权频繁项集筛选、候选加权频繁项集生成所造成时间代价高的问题。引入分布式并行计算思想,并在Spark框架下编程实现,使算法可以在大数据环境下对加权关联规则进行高效挖掘。利用数值实例对该模型和算法进行了验证,结果表明此算法可在保证算法时间效率优越的同时获得更多隐藏信息。Association rule mining is mainly used to discover knowledge hidden in the data.Weighted association rule mining can mine rules with different importance of the project more effectively.There is a certain subjective arbitrariness in the method of artificial weight assignment,existing the problem of not fully utilizing the characteristics of the data itself and the serial algorithm cannot deal with large data sets.A parallel mining algorithm for independent probability fully weighted association rules is proposed.The algorithm constructs a fully weighted model based on the probability of the item appearing in the current data set.In order to mine the association rules expected by more users.Technologies like prefix partition and bitmap storage are used to solve the problem of high time cost caused by weighted frequent item-sets filtering and candidate weighted frequent item-sets generation respectively.The idea of distributed parallel computing is introduced and implemented in Spark framework,which enables the algorithm to efficiently mine weighted association rules in big data environment.Numerical examples are used to verify the proposed model and algorithm.The results show that the proposed algorithm can obtain more effective hidden information and has higher time efficiency.

关 键 词:关联规则挖掘 完全加权 独立概率 并行计算 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象