基于矩阵的关联规则增量更新及其改进算法  被引量:8

Matrix-based association rule incremental updating and its improved algorithm

在线阅读下载全文

作  者:耿志强[1,2] 张杨[1,2] 韩永明[1,2] GENG ZhiQiang ZHANG Yang HAN YongMing(College of Information Science and Technology Engineering Research Center of Intelligent PSE, Ministry of Education, Beijing University of Chemical Technology, Beijing 100029, China)

机构地区:[1]北京化工大学信息科学与技术学院,北京100029 [2]北京化工大学智能过程系统工程教育部工程研究中心,北京100029

出  处:《北京化工大学学报(自然科学版)》2016年第5期89-94,共6页Journal of Beijing University of Chemical Technology(Natural Science Edition)

基  金:国家自然基金(61374166);北京市自然科学基金(4162045);教育部博士点基金(20120010110010);中央高校基本科研业务费(JD1502)

摘  要:为了解决大数据环境下如何高效地挖掘关联规则并进行增量更新,在原有的fast updating pruning(FUP)算法基础上,首先提出一种基于矩阵的关联规则增量更新方法(MFUP),该方法将数据集转化成布尔矩阵,减少对数据集的扫描次数以及数据集的存储量;然后将MFUP与Hadoop分布式计算框架结合,提出一种分布式环境下的新算法Cloud MFUP(CMFUP);最后通过设计实验进行对比分析。结果表明,在挖掘相同数据量的关联规则并进行增量更新时,MFUP算法相比FUP算法执行时间更少,且随着数据集的增加,其增速更慢;对比CMFUP与MRFUP算法表明,随着分布式环境下数据集的增加,前者较后者执行时间更短增速更慢。In an attempt to efficiently mine association rules and update increments for the case of big data,we first discuss a series of improved algorithms based on the fast updating pruning( FUP) algorithm,and then propose a new matrix FUP( MFUP) algorithm based on association rules and incremental updating of matrices.The proposed method reduces the scan times of datasets by transforming the datasets to a Boolean matrix,and the storage space required is also decreased by using the Boolean matrix.An experimental study of incremental updating of frequent items verified that the time required by the MFUP algorithm is less than that for the FUP algorithm when mining association rules and updating increments for the same amount of data.In addition,as the number of datasets increases,the rate of increase of the time required is slower in the case of the MFUP algorithm.A second experiment indicated that the time required by the two algorithms decreased as the support degree increased.Furthermore,by introducing the Hadoop platform into the MFUP algorithm when updating the matrix of incremental datasets,an improved cloud MFUP( CMFUP) algorithm based on a distributed computing environment has been proposed.When increasing the number of datasets in the distributed computing environment,the time required by the CMFUP algorithm is less than that of the map reduce FUP( MRFUP) algorithm,and the rate of increase of the time required is also slower.In addition,as the number of cluster datanodes increases,the time required decreases.

关 键 词:FAST updating pruning(FUP)算法 关联规则 增量更新 HADOOP平台 布尔矩阵 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象