检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:罗少甫[1] 刘河 Luo Shaofu;Liu He(Department of Basic Sciences,Chongqing Aerospace Vocational and Technical College,Chongqing 400021,China;Chongqing Academy of Education Science,Chongqing 400015,China)
机构地区:[1]重庆航天职业技术学院基础学科部,重庆400021 [2]重庆市教育科学研究院,重庆400015
出 处:《统计与决策》2024年第22期59-64,共6页Statistics & Decision
基 金:重庆市教育委员会科学技术研究项目(KJQN202203007);重庆市教育科学规划项目(K22YG218233);重庆市科研院所绩效激励引导专项项目(cstc2022jxj10214);重庆市教委科学技术研究计划重点项目(KJZD-K202114401)。
摘 要:样本规约方法是统计机器学习中的杰出数据预处理范式,能从有标记训练集中移除冗余样本和噪声,从而提升分类统计算法的性能。虽然学者们提出了大量基于进化算法的样本规约方法,并证明了其有效性,但是现有基于进化算法的样本规约方法依赖太多参数。而且随着有标记训练集中的样本数量增加,现有基于进化算法的样本规约方法的搜索效率较低且时间成本较高。为了克服上述问题,文章提出一种基于加速骨干二元粒子群优化的样本规约方法(SRM-HBPSO)。在SRM-HBPSO中,首先,设计了一种结合搜索空间约简策略的加速骨干二元粒子群优化算法(HBPSO);其次,用HBPSO优化有标记训练集,从而得到一个被优化的约简子集;最后,SRM-HBPSO在被优化的约简子集上训练给定的分类统计算法,从而改进其性能。经仿真实验证明,就改进随机森林分类统计算法的平均分类正确率和提升平均样本约简率而言,在来自金融、医疗、图像等领域的10个真实基准数据集上,SRM-HBPSO优于5个先进的样本规约算法。Sample specification method is an outstanding data preprocessing paradigm in statistical machine learning,and it can be used to remove redundant samples and noise from labeled training sets,thus improving the performance of classification statistical algorithms.Although scholars have proposed a large number of sample specification methods based on evolutionary al-gorithms and proved their effectiveness,the existing sample specification methods based on evolutionary algorithms rely on too many parameters.Moreover,as the number of samples in labeled training sets increases,the existing sample specification meth-ods based on evolutionary algorithms have lower search efficiency and greater time overhead.In order to overcome these prob-lems,this paper proposes a sample specification method based on hybrid backbone binary particle swarm optimization(SRM-HB-PSO).In SRM-HBPSO,firstly,a hybrid backbone binary particle swarm optimization(HBPSO)algorithm combined with search space reduction strategy is designed.Then the labeled training set is optimized by HBPSO to obtain an optimized reduced subset.Finally,SRM-HBPSO trains a given classification statistical algorithm on the reduced subset that is optimized,thereby improving its performance.Simulation experiments show that,in terms of improving the average classification accuracy and improving the average sample reduction rate of the random forest classification statistical algorithm,SRM-HBPSO is superior to 5 advanced sample specification algorithms on 10 real benchmark data sets from the fields of finance,medical treatment and image.
关 键 词:统计机器学习 分类统计算法 样本规约 随机森林 搜索空间约简策略
分 类 号:O212[理学—概率论与数理统计] TP391[理学—数学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7