大数据量下的Apriori改进算法及在weka平台的实现  被引量:4

The Improved Apriori Algorithm for Big Data and the Implementation on weka platform

在线阅读下载全文

作  者:范多锋[1] 徐俊刚[1] 

机构地区:[1]中国科学院研究生院

出  处:《电子技术(上海)》2012年第7期1-4,共4页Electronic Technology

摘  要:文章在分析关联规则和Apriori算法原理的基础上,针对Apriori算法扫描数据库时由于事务数过大,导致系统的I/O负载和CPU运算压力过大等弊端,提出一种主要针对大数据量情况下Apriori算法性能提升的改进算法。主要思想是通过抽样和事务压缩来减少算法需要扫描的事务数,进而提升算法的效率。同时,基于主流的weka开源数据挖掘工具实现了改进算法。实验结果表明了算法的有效性。On the basis of analyzing association rules and the principle of Apriori algorithm, and aiming at the drawbacks that when the Apriori algorithm scan database, the system's I/O load and the CPU computing pressure are excessive due to the too large amount of transactions, this paper proposes an improved algorithm for Apriori algorithm to enhance the performance of the algorithm in the case of large data amount. The main idea is to enhance the algorithm's performance which uses sampling and transaction compression to reduce the transaction number to be scanned, and thus enhance the efficiency of the algorithm. Meanwhile, this paper achieves the improved algorithm based on Weka, a mainstream open source data mining tools. The experimental results show the effectiveness of the algorithm.

关 键 词:数据挖掘 关联规则 APRIORI 事务压缩 抽样 WEKA 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象