检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王璐 程晓荣 WANG Lu;CHENG Xiaorong(School of Control and Computer Engineering,North China Electric Power University,Baoding 071000)
机构地区:[1]华北电力大学(保定)控制与计算机工程学院,保定071000
出 处:《计算机与数字工程》2025年第1期26-30,共5页Computer & Digital Engineering
摘 要:近些年人们对机器学习方面的研究日益关注,而机器学习领域的研究重点之一就是集成学习。集成学习的基本原理就是使用了许多独立的分类器,并采用一种方法使之融合为一个强学习器,用以克服单一学习器分类的缺陷。在对Bagging算法、随机森林算法、加权KNN(K-NearestNeighbor)算法和AdaBoost算法四种算法对比的基础上,将加权KNN算法和AdaBoost算法融合在一起。所采用的数据集为网络用户购物行为数据集。在实验过程中,对不平衡数据利用SMOTE采样进行处理,再对上述四种算法和改进后的AdaBoost算法进行评价和对比。通过对比发现,改进后的AdaBoost算法的预测性能更好。再将改进后的AdaBoost算法在Spark平台并行计算,提高计算效率。In recent years,there is a growing focus on machine learning research,and one of the key points in the field of ma⁃chine learning is ingestion learning.The basic principle of integrated learning is to use many independent classifiers and adopt a method to fuse them into a strong learner to overcome the shortcomings of single learner classification.Based on the comparison of four algorithms,which are Bagging algorithm,Random Forest algorithm,Weighted KNN(K-NearestNeighbor)algorithm and Ada⁃Boost algorithm,the weighted KNN algorithm and the AdaBoost algorithm are fused together.The dataset used is the dataset of the shopping behavior of network users.During the experiment,the unbalanced data is first processed using SMOTE sampling,and then the above four algorithms and the improved AdaBoost algorithm are evaluated and compared.Through comparison,it is found that the improved AdaBoost algorithm has better prediction performance.The improved AdaBoost algorithm is computed in parallel on the Spark platform to improve computing efficiency.
关 键 词:集成学习 ADABOOST算法 SMOTE采样 加权KNN(K-NearestNeighbor)算法 不平衡数据 Spark平台
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7