检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南京邮电大学计算机学院软件学院,江苏南京210003
出 处:《南京师大学报(自然科学版)》2016年第4期25-30,共6页Journal of Nanjing Normal University(Natural Science Edition)
基 金:国家自然科学基金(61171053)
摘 要:传统的决策树算法在单机平台上处理海量数据挖掘时,容易受到计算能力和存储能力的限制,所以存在耗时过长、容错性差、存储量小的缺点.而拥有高可靠性和高容错性的Hadoop平台的出现为决策树算法的并行化提供了新的思路.本文设计和实现了一种基于Hadoop平台的并行SPRINT分类算法.实验结果表明:基于Hadoop平台的SPRINT分类算法比没有进行并行化的SPRINT算法具有较好的分类正确率、较低的时间复杂度和较好的并行性能,并且能明显提高算法求最佳分裂点时的执行速度.When the traditional decision tree algorithms handle massive data mining on a single platform,due to limited computing power and storage capacity. It has the shortcomings that taking too long time, poor fault tolerance, small storage capacity. The emergence of the Hadoop platform which has high reliability and fault tolerance has provided a new way for parallelization of decision tree algorithm. In this paper,a parallel SPRINT classification algorithm based on Hadoop plat-form has been designed and implemented. The results show that the SPRINT classification algorithm based on Hadoop platform has better classification accuracy than the SPRINT algorithm without parallelization. It also has lower time com-plexity and better parallel performance. It can improve the execution speed of the algorithm for the best time of the split point significantly.
关 键 词:HADOOP MAPREDUCE 数据挖掘 决策树 SPRINT算法
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28