改进不平衡数据中的集成学习算法  

Improve Ensemble Learning Algorithms in Unbalanced Data

在线阅读下载全文

作  者:王璐 程晓荣 WANG Lu;CHENG Xiaorong(School of Control and Computer Engineering,North China Electric Power University,Baoding 071000)

机构地区:[1]华北电力大学(保定)控制与计算机工程学院,保定071000

出  处:《计算机与数字工程》2025年第1期26-30,共5页Computer & Digital Engineering

摘  要:近些年人们对机器学习方面的研究日益关注,而机器学习领域的研究重点之一就是集成学习。集成学习的基本原理就是使用了许多独立的分类器,并采用一种方法使之融合为一个强学习器,用以克服单一学习器分类的缺陷。在对Bagging算法、随机森林算法、加权KNN(K-NearestNeighbor)算法和AdaBoost算法四种算法对比的基础上,将加权KNN算法和AdaBoost算法融合在一起。所采用的数据集为网络用户购物行为数据集。在实验过程中,对不平衡数据利用SMOTE采样进行处理,再对上述四种算法和改进后的AdaBoost算法进行评价和对比。通过对比发现,改进后的AdaBoost算法的预测性能更好。再将改进后的AdaBoost算法在Spark平台并行计算,提高计算效率。In recent years,there is a growing focus on machine learning research,and one of the key points in the field of ma⁃chine learning is ingestion learning.The basic principle of integrated learning is to use many independent classifiers and adopt a method to fuse them into a strong learner to overcome the shortcomings of single learner classification.Based on the comparison of four algorithms,which are Bagging algorithm,Random Forest algorithm,Weighted KNN(K-NearestNeighbor)algorithm and Ada⁃Boost algorithm,the weighted KNN algorithm and the AdaBoost algorithm are fused together.The dataset used is the dataset of the shopping behavior of network users.During the experiment,the unbalanced data is first processed using SMOTE sampling,and then the above four algorithms and the improved AdaBoost algorithm are evaluated and compared.Through comparison,it is found that the improved AdaBoost algorithm has better prediction performance.The improved AdaBoost algorithm is computed in parallel on the Spark platform to improve computing efficiency.

关 键 词:集成学习 ADABOOST算法 SMOTE采样 加权KNN(K-NearestNeighbor)算法 不平衡数据 Spark平台 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象