决策树分类算法的预剪枝与优化  被引量:11

Pre-Pruning and Optimization of Decision Tree Classification Algorithm

在线阅读下载全文

作  者:郑力嘉 宋冰[1] ZHENG Lijia;SONG Bing(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 201424,China)

机构地区:[1]华东理工大学信息科学与工程学院,上海201424

出  处:《自动化仪表》2023年第5期56-62,共7页Process Automation Instrumentation

摘  要:决策树分类算法是1种直观、有效的分类算法。针对影响决策树算法分类效果的2个重要因素———属性选择度量及预剪枝参数,对算法进行优化。以澳大利亚某地降水预测为实例,搭建迭代二叉树3代(ID3)及分类与回归树(CART)模型并对其进行优化。通过数据预处理及预剪枝操作,改进了算法,有效防止了过拟合,提高了决策树的分类性能。基于交叉检验方法优化了2种模型的参数,提高了预测精度。性能对比结果表明,基于基尼指数构建的决策树精度更高。针对该决策树,在优化后的参数附近构建三维网络搜索最优参数,达到了更高的预测准确率。Decision tree classification algorithm is an intuitive and effective classification algorithm.The algorithm is optimized for two important factors that affect the classification effectiveness of decision tree algorithm,the attribute selection metric,and the pre-pruning parameter.The iterative dichotomiser 3(ID3)and classification and regression tree(CART)models are built and optimized with the example of precipitation prediction in a city of Ainstralia.Through data preprocessing and pre-pruning operations,the algorithm is improved to effectively prevent overfitting,and the classification performance of the decision tree is improved.The parameters of the two models are optimized based on the cross-checking method,and the prediction accuracy is improved.The performance comparison results show that the decision tree constructed based on Gini index has higher accuracy.For this decision tree,a three-dimensional network is constructed near the optimized parameters to search for the optimal parameters,and higher prediction accuracy is achieved.

关 键 词:决策树 分类算法 信息增益 基尼指数 交叉检验 预剪枝 

分 类 号:TH764[机械工程—仪器科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象