面向非平衡数据集的随机森林算法对学生学业问题的预测分析  被引量:2

Predictive Analysis and Research of Students Academic Problems Based on Random Forest Algorithm Under Imbalanced Data Sets

在线阅读下载全文

作  者:刘博[1] 卢婷婷[1] 陈国磊 赵璐[1] LIU Bo;LU Tingting;CHEN Guolei;ZHAO Lu(College of Air Traffic Management,Civil Aviation University of China,Tianjin 300300,China)

机构地区:[1]中国民航大学空中交通管理学院

出  处:《宜宾学院学报》2019年第12期72-78,共7页Journal of Yibin University

基  金:国家自然科学基金青年科学基金项目“基于排队网络模型的机场群航班时刻资源优化配置关键技术研究”(61603396)

摘  要:由于数据集里类别分布不均,传统随机森林(Random Forest)分类器的性能受到一定程度制约,面对学业数据集中成绩较差人数占少数比例的非平衡性问题,为了在一定程度提高模型预测性能,提出SMOTEENN混合采样方式结合随机森林分类器的组合分类预测模型的方法(SER)对学生学业表现进行分类;同时基于10种非平衡性数据集采样方法,对比分析了包括随机森林在内的5种模型的性能.实验结果表明,使用SER方法对学生学业表现情况预测最优,分类器性能指标F1-Score和Recall的值分别为0.98和0.97,达到了预期目的.Due to the uneven distribution of classification in the data sets, the performance of the traditional random forest classifier is somewhat restricted. When dealing with a minority sample of students with poor academic performance, in order to improve the performance of classifier to some extent, a method of combining SMOTEENN with random forest classifier to predict students’ academic performance(SER) was proposed. At the same time, based on ten sorts of imbalanced data sets sampling methods,the performance of five models including random forests was analyzed. The experimental results show that the SER method is the best predictor for students’ academic performance prediction. The F1-Score and Recall values of the classifier performance indicators are 0.98 and 0.97, respectively. The proposed method achieves the intended purpose.

关 键 词:学生学业问题 非平衡数据集 混合采样 随机森林 分类 

分 类 号:TP399.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象