基于遗传算法与随机森林的XGBoost改进方法研究  被引量:32

Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest

在线阅读下载全文

作  者:王晓晖 张亮[1] 李俊清 孙玉翠[1] 田捷 韩睿毅 WANG Xiao-hui;ZHANG Liang;LI Jun-qing;SUN Yu-cui;TIAN Jie;HAN Rui-yi(School of Information Science and Engineering,Shandong Agricultural University,Taian,Shangdong 271018,China;Agricultural Big Data Research Center,Shandong Agricultural University,Taian,Shangdong 271018,China)

机构地区:[1]山东农业大学信息科学与工程学院,山东泰安271018 [2]山东农业大学农业大数据研究中心,山东泰安271018

出  处:《计算机科学》2020年第S02期454-458,463,共6页Computer Science

基  金:大数据驱动下流域水库群联合防洪调度研究(2019GSF111043)。

摘  要:回归预测是机器学习中重要的研究方向之一,有着广阔的应用领域。为了进一步提升回归预测的精度,提出了基于遗传算法与随机森林的XGBoost改进方法(GA_XGBoost_RF)。首先利用遗传算法(Genetic Algorithm,GA)良好的搜索能力和灵活性,以交叉验证平均得分为目标函数值,对XGBoost算法和随机森林算法(Random Forest,RF)的参数进行调优,选出较好的参数集,分别建立GA_XGBoost和GA_RF模型。然后对GA_XGBoost和GA_RF进行变权组合,利用训练集的预测值与真实值的均方误差为目标函数,使用遗传算法确定模型的权重。在UCI数据集上进行了实验,结果表明,与XGBoost,Random Forest,GA_XGBoost,GA_RF算法相比,在大部分数据集上GA_XGBoost_RF方法的均方误差、绝对误差和拟合度均优于单一模型,其中在拟合度方面所提方法在不同数据集上提高了约0.01%~2.1%,是一种有效的回归预测方法。Regression prediction is one of the important research directions in machine learning and has a broad application field.In order to improve the accuracy of regression prediction,an improved XGBoost method(GA_XGBoost_RF)based on genetic algorithm and random forest is proposed.Firstly,with the good search ability and flexibility of Genetic Algorithm(GA),the XGBoost Algorithm and Random Forest Algorithm(RF)parameters are optimized with the average score of cross-validation as the objective function value,and the better parameter set is selected to establish GA_XGBoost and GA_RF models,respectively.Then the variable weight combination of GA_XGBoost and GA_RF is performed.The mean square error between the predicted value and the real value of the training set is used as the objective function,and the weight of the model is determined by genetic algorithm.On UCI data sets and the results show that the XGBoost and Random Forest,GA_XGBoost,GA_RF algorithm compared to GA_XGBoost_RF method in most of the data set is the fit of the mean square error(mse)and absolute error and are superior to single model,the proposed method on fitting on different data sets improves by about 0.01%~2.1%,is a kind of effective regression forecast method.

关 键 词:回归预测 XGBoost 组合预测 随机森林 遗传算法 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象