基于Hadoop平台的SPRINT算法的分析与研究  被引量:2

Analysis and Study of SPRINT Algorithm Based on Hadoop Platform

在线阅读下载全文

作  者:黄刚[1] 孙媛[1] 

机构地区:[1]南京邮电大学计算机学院软件学院,江苏南京210003

出  处:《南京师大学报(自然科学版)》2016年第4期25-30,共6页Journal of Nanjing Normal University(Natural Science Edition)

基  金:国家自然科学基金(61171053)

摘  要:传统的决策树算法在单机平台上处理海量数据挖掘时,容易受到计算能力和存储能力的限制,所以存在耗时过长、容错性差、存储量小的缺点.而拥有高可靠性和高容错性的Hadoop平台的出现为决策树算法的并行化提供了新的思路.本文设计和实现了一种基于Hadoop平台的并行SPRINT分类算法.实验结果表明:基于Hadoop平台的SPRINT分类算法比没有进行并行化的SPRINT算法具有较好的分类正确率、较低的时间复杂度和较好的并行性能,并且能明显提高算法求最佳分裂点时的执行速度.When the traditional decision tree algorithms handle massive data mining on a single platform,due to limited computing power and storage capacity. It has the shortcomings that taking too long time, poor fault tolerance, small storage capacity. The emergence of the Hadoop platform which has high reliability and fault tolerance has provided a new way for parallelization of decision tree algorithm. In this paper,a parallel SPRINT classification algorithm based on Hadoop plat-form has been designed and implemented. The results show that the SPRINT classification algorithm based on Hadoop platform has better classification accuracy than the SPRINT algorithm without parallelization. It also has lower time com-plexity and better parallel performance. It can improve the execution speed of the algorithm for the best time of the split point significantly.

关 键 词:HADOOP MAPREDUCE 数据挖掘 决策树 SPRINT算法 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象