基于多效用阈值的分布式高效用序列模式挖掘  被引量:1

Distributed high utility sequence pattern mining based on multi utility thresholds

在线阅读下载全文

作  者:曾毅[1] 张福泉 ZENG Yi;ZHANG Fu-quan(Computer and Information Engineering Department,Guangxi University Xingjian College of Science and Liberal Arts,Nanning 530005,China;School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)

机构地区:[1]广西大学行健文理学院计算机与信息工程系,广西南宁530005 [2]北京理工大学计算机学院,北京100081

出  处:《计算机工程与设计》2020年第2期449-457,共9页Computer Engineering and Design

基  金:福建省科技厅引导性基金项目(2018H0028);广西壮族自治区教育厅2019年度广西高校中青年教师科研基础能力提升基金项目(2019KY0960)

摘  要:针对序列模式的高效用模式挖掘过程中搜索空间大、计算复杂度高的问题,提出一种基于多效用阈值的分布式高效用序列模式挖掘算法。采用数组结构保存模式的效用信息,解决效用矩阵导致的内存消耗大的缺点。设计1-项集与2-项集的深度剪枝策略,深入地缩小候选模式的搜索空间,减少搜索时间成本与缓存成本。提出挖掘算法的分布式实现方案,通过并行处理进一步降低模式挖掘的时间。基于中等规模与大规模的序列数据集分别进行实验,实验结果表明,该算法有效减少了候选模式的数量,降低了挖掘的时间成本与存储成本,对于大数据集表现出较好的可扩展能力与稳定性。Aiming at the problems of large search space and high computational complexity of high utility pattern mining for sequence patterns,a distributed high utility sequence pattern mining algorithm based on multi utility thresholds was proposed.Structure of arrays was adopted to store utility information of patterns,and the disadvantage of large memory consumption of utility matrix was resolved.Deep mining strategies for one-itemset and two-itemsets were designed,and the search space of candidate patterns was reduced deeply,so that time cost and memory cost were both reduced.The distributed implementation schema for the mining algorithm was proposed,further,the patterns mining time was reduced through the parallel process.Experiments were done based on middle scale and large scale sequence datasets respectively,the proposed algorithm reduces the number of candidate patterns,mining time and storage effectively,and it performs good scalability and stability for big datasets.

关 键 词:序列模式 大数据 高效用模式挖掘 分布式计算 频繁项集 剪枝策略 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象