检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机工程与设计》2018年第1期126-133,145,共9页Computer Engineering and Design
摘 要:为解决传统数据挖掘算法在大量数据处理时面临的内存占用、计算性能等方面的问题,基于Hadoop平台,应用HBase文件存储系统对海量数据分布式存储以及Map Reduce框架进行分布式计算,实现Apriori经典数据挖掘算法。通过对已实现的Apriori算法进行优化,引入FIS-IS算法思想,从数据库扫描次数和容量消减方向进行改进。提出针对数据本身进行频繁预选项生成方法与对于频繁预选项剪枝步骤进行分组检索的优化方法。实验结果验证了改进算法对算法运行具有良好的优化效果。To solve the problems of the traditional data mining algorithms,such as memory usage and computation performance,which were faced with a large amount of data processing,based on the Hadoop platform,the HBase file storage system and the MapReduce framework were used to realize the Apriori data mining algorithm.Through the implementation of the Apriori algorithm had been optimized,the idea of FIS-IS algorithm was introduced to improve the scan times and the capacity of the database.Aiming at the data itself,frequent pre-option generation was implemented and optimization method of grouping retrieval for frequent item pre-pruning steps was proposed.Experimental results show that the improved algorithm has good optimization effects.
关 键 词:APRIORI算法 数据挖掘算法 分布式实现 HADOOP平台 MAPREDUCE框架
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.44