基于MapReduce的Bagging决策树优化算法被引量：8

An optimized bagging decision tree algorithm based on MapReduce

作　　者：张元鸣[1] 陈苗[1] 陆佳炜[1] 徐俊[1] 肖刚[1]

出　　处：《计算机工程与科学》2017年第5期841-848,共8页Computer Engineering & Science

基　　金：浙江省重大科技专项(2014C01408);浙江省公益性技术项目(2017C31014)

摘　　要：针对经典C4.5决策树算法存在过度拟合和伸缩性差的问题,提出了一种基于Bagging的决策树改进算法,并基于MapReduce模型对改进算法进行了并行化。首先,基于Bagging技术对C4.5算法进行了改进,通过有放回采样得到多个与初始训练集大小相等的新训练集,并在每个训练集上进行训练,得到多个分类器,再根据多数投票规则集成训练结果得到最终的分类器;然后,基于MapReduce模型对改进算法进行了并行化,能够并行化处理训练集、并行选择最佳分割属性和最佳分割点,以及并行生成子节点,实现了基于MapReduce Job工作流的并行决策树改进算法,提高了对大数据集的分析能力。实验结果表明,并行Bagging决策树改进算法具有较高的准确度与敏感度,以及较好的伸缩性和加速比。In order to address the shortcomings of overfitting and poor scalability of the C4.5 decision tree algorithm, we propose an optimized C4.5 algorithm with Bagging technique, and then parallelize it according to the MapReduce model. The optimized algorithm can obtain multiple new training sets that are equal to the initial training set by sampling with replacement. Multiple classifiers can be obtained by training the algorithm with these new trainingsets. A final classifier is generated according to a majority voting rule that integrates the training results. Then, the optimized algorithm is parallelized in three as- pects, including parallel processing training sets, parallel selecting optimal decomposition attributes and optimal decomposition point, and parallel generating child nodes. A parallel algorithm based on job workflow is implemented to improve the ability of big data analysis. Experimental results show that the parallel and optimized decision tree algorithm has higher accuracy, higher sensitivity, better scalability and higher performance.

关键词：决策树 BAGGING MAPREDUCE模型大数据分析准确性

分类号：TP390.027[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于MapReduce的Bagging决策树优化算法被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于MapReduce的Bagging决策树优化算法 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于MapReduce的Bagging决策树优化算法被引量：8