基于Stacking集成学习的脱贫人口返贫风险预测方法研究  

Research on prediction method of poverty-returning risk for poverty alleviation population base on stacking ensemble learning

在线阅读下载全文

作  者:刘红达 孙小华[3] 王斌[1,2] 王超 王福顺[1,2] LIU Hongda;SUN Xiaohua;WANG Bin;WANG Chao;WANG Fushun(School of Information Science and Technology,Hebei Agricultural University,Baoding 071001,China;Hebei Key Laboratory of Agricultural Big Data,Baoding 071000,China;Hebei Software Vocational and Technical College,Baoding 071000,China)

机构地区:[1]河北农业大学信息科学与技术学院,河北保定071001 [2]河北省农业大数据重点实验室,河北保定071000 [3]河北软件职业技术学院,河北保定071000

出  处:《河北农业大学学报》2024年第6期75-82,共8页Journal of Hebei Agricultural University

基  金:河北省重点研发计划项目(22327403D).

摘  要:脱贫人口的返贫风险是影响脱贫攻坚成果与乡村振兴有效衔接的主要因素,精准预测脱贫人口的潜在返贫风险,对于指导政策落实、资源分配和风险评估具有至关重要的作用。本文提出一种基于Stacking集成学习的脱贫人口返贫风险预测方法,以H省脱贫户脱敏后的监测数据为研究对象,对数据特征进行相关性分析及重要性排序,识别并筛选显著影响返贫风险的关键特征;基于关键特征数据对随机森林(Random forest,RF)、朴素贝叶斯(Naive bayes,NB)、支持向量机(Support vector machine,SVM)等独立模型进行模型间的相关性分析,以相关性较低且预测准确率较高的极限梯度提升树(eXtreme gradient boosting,XGBoost)、自适应提升算法(Adaptive boosting,adaBoost)、SVM作为基础学习器,RF作为元学习器构建了Stacking集成学习预测模型。将412919条数据以7∶3划分成训练集和验证集对模型进行训练和验证,并使用准确率、精确率、召回率和F1-Score评价模型效果。实验结果表明,基于Stacking集成学习的返贫风险预测模型各项评价指标均优于单一模型,其预测准确率与RF、NB、SVM、XGBoost、AdaBoost相比分别提升3.64%、10.96%、3.15%、2.29%和5.41%,最终达到了95.65%,验证了本文所提方法的有效性。该研究为巩固和拓展脱贫攻坚成果,提升返贫动态监测预警时效提供了新的解决思路。The poverty-returning risk of the poverty alleviation population is a major factor on the results of poverty eradication and rural revitalization.Accurate prediction of the potential poverty-returning risk of the poverty alleviation population plays a crucial role in guiding the implementation of policies,allocation of resources,and risk assessment.This paper proposed a prediction method based on Stacking ensemble learning for the poverty-returning risk of the poverty alleviation population.taking The monitoring data after desensitization of the poverty alleviation households in Province H was analyzed to identify and filter the key features that significantly affect the poverty-returning risk after correlation analysis and importance ranking of the data features,whose key features were adopted in inter-model correlation analysis of the independent models such as Random Forest(RF),Naive Bayes(NB),Support Vector Machine(SVM),etc.The Stacking ensemble learning prediction model was conducted with RF meta-learner using eXtreme Gradient Boosting(XGBoost),Adaptive Boosting(AdaBoost)and SVM that have lower correlation and higher prediction accuracy.The model was trained and validated by dividing 412919 data into training and validation sets in 7:3,and the model effect was evaluated using accuracy,precision,recall and F1-Score.The experimental results showed that all evaluation indexes of the poverty-returning risk prediction model based on Stacking ensemble learning were better than that of a single model,and its prediction accuracy was improved by 3.64%,10.96%,3.15%,2.29%,and 5.41%compared with RF,NB,SVM,XGBoost,and AdaBoost,respectively,and finally reached 95.65%,which verified the effectiveness of the method proposed in this paper.The study provided new solution ideas for consolidating and expanding the results of poverty eradication and improving the timeliness of returning to poverty dynamic monitoring and warning.

关 键 词:Stacking集成学习 返贫风险预测 机器学习 特征选择 相关性分析 

分 类 号:F323.8[经济管理—产业经济] TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象