大数据环境下的高效分布式增量序列挖掘  被引量:2

Efficient Distributed Incremental Sequence Mining in Big Data Environment

在线阅读下载全文

作  者:南楠 严英占[2] NAN Nan;YAN Ying-zhan(Basic Education College, Lingnan Normal University, Zhanjiang Guangdong 524048, China;No. 54th Institute, China Electronics Technology Group, Shijiazhuang 050081, China)

机构地区:[1]岭南师范学院基础教育学院,广东湛江524048 [2]中国电子科技集团第54研究所,石家庄050081

出  处:《西南师范大学学报(自然科学版)》2020年第11期80-85,共6页Journal of Southwest China Normal University(Natural Science Edition)

基  金:国家自然科学基金项目(61404119);河北省教育厅青年基金项目(QN2016182).

摘  要:本文提出一种基于MapReduce架构的高效分布式增量序列模式挖掘算法(Incremental Sequential Pattern Mining,IncSPM),用于解决大数据环境中每当数据增加时就更新序列模式的问题.该算法利用后向挖掘算法来有效利用先前挖掘生成的序列模式,同时设计同现反转映射(Co-occurrence Reverse Map,CRMAP)数据结构来处理候选序列的组合爆炸问题,最后设计了新的候选生成和早期修剪机制以加快挖掘过程.用两种真实数据集对本文提出的算法进行了评估,实验表明与其他方法相比,本文算法在执行时间、内存消耗和扩展性方面均有实质性的提高.An efficient distributed incremental sequential pattern mining algorithm(Incremental Sequential Pattern Mining,IncSPM)based on MapReduce architecture is proposed to solve the problem of updating sequential patterns whenever data increases in big data environment.With this algorithm,the backward mining algorithm is used to utilize effectively the sequence patterns generated by previous mining,and simultaneously design a Co-occurrence Reverse Map(CRMAP)data structure to deal with the combined explosion problem of candidate sequences.Finally,new candidate generation and early pruning mechanism are designed to speed up the mining process.The proposed algorithm is evaluated on two real datasets,and experiments show that compared with other methods,the algorithm proposed in this paper has a substantial improvement in execution time,memory consumption and scalability.

关 键 词:大数据挖掘 增量序列模式 后向挖掘 同现反转映射数据结构 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象