基于改进SMOTE算法和Ensemble模型的学习结果预测方法  被引量:1

Learning Result Prediction Based on Improved SMOTE Algorithm and Ensemble Model

在线阅读下载全文

作  者:王晓勇[1] 胡胜利[2] WANG Xiaoyong;HU Shengli(School of Information Engineering,Huainan Union University,Huainan 232038,China;School of Computer Science and Engineering,Anhui University of Science Technology,Huainan 232001,China)

机构地区:[1]淮南联合大学信息工程学院,安徽淮南232038 [2]安徽理工大学计算机科学与工程学院,安徽淮南232001

出  处:《中北大学学报(自然科学版)》2024年第3期257-264,共8页Journal of North University of China(Natural Science Edition)

基  金:安徽省重点科研项目(KJ2021A1306)。

摘  要:为解决不同领域的数据分类和预测任务中单个机器学习算法适用性较差的问题,以及缓解数据集严重不平衡对预测性能的影响,提出了基于合成少数类过采样(SMOTE)和Ensemble集成模型的数据分类方法。传统SMOTE算法通过对少数类样本进行插值来生成新的合成样本,合成样本中存在噪声和样本间相似性较高的问题。为此,提出了改进的SMOTE算法,通过距离计算移除噪声样本和易混淆样本,得到高区分度的纯净合成样本。然后,利用Ensemble方法调整样本和分类器权重,并组成分类效果更好的强分类器。在公开在线学习数据集Kalboard360上的实验结果表明,使用极限随机树(ERT)分类器时,结合改进SMOTE和Ensemble模型后实现了97.9%的预测准确度,比单个ERT分类器提升了5.5%,证明所提改进SMOTE算法能够生成高质量的均衡化数据,且集成学习模型的性能显著优于单个机器学习算法。In order to solve the problem of poor applicability of a single machine learning algorithm in data classification and prediction tasks in different fields,and to alleviate the impact of severe imbalance in data-sets on prediction performance,a learning result prediction method based on Synthetic Minority Oversam-pling(SMOTE)and the ensemble model was proposed.The traditional SMOTE algorithm generated new synthetic samples by interpolating minority class samples,which could result in the presence of noise and high similarity between synthetic samples.To address these issues,an improved SMOTE algorithm was proposed,which removed noisy and easily confused samples by distance calculation,resulting in high discriminative and pure synthetic samples.Subsequently,an ensemble method was utilized to adjust the weights of samples and classifiers,leading to the creation of a stronger classifier with improved classifica-tion performance.Experimental results on the public online learning dataset Kalboard360 show that when using the Extreme Randomized Trees(ERT)classifier,in combination with improved SMOTE and Ensemble model,resulted in a prediction accuracy of 97.9%,which is a 5.5%increase compared to using a single ERT classifier.This demonstrates that the proposed SMOTE algorithm can generate high-quality balanced data,and the performance of the Ensemble learning model is significantly better than that of a single machine learning algorithm.

关 键 词:机器学习 神经网络 数据挖掘 集成学习 数据均衡化 学习结果预测 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象