检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵锦阳 卢会国[1,2] 蒋娟萍[1,2] 袁培培 柳学丽 Zhao Jinyang;Lu Huiguo;Jiang Juanping;Yuan Peipei;Liu Xueli(College of Electronic Engineering, Chengdu University of Information Technology , Chengdu 610225, Sichuan, China;Key Laboratory of Atmospheric Sounding of CMA , Chengdu 610225, Sichuan, China;School of Astronautics and Aeronautic, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan , China;College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210000, Jiangsu, China)
机构地区:[1]成都信息工程大学电子工程学院,四川成都610225 [2]中国气象局大气探测重点开放实验室,四川成都610225 [3]电子科技大学航空航天学院,四川成都611731 [4]南京财经大学信息工程学院,江苏南京210000
出 处:《计算机应用与软件》2019年第4期255-261,316,共8页Computer Applications and Software
基 金:四川省教育厅重点科技计划资助项目(14ZA0170)
摘 要:在灾害天气、故障诊断、网络攻击和金融欺诈等领域经常存在不平衡的数据集。针对随机森林算法在非平衡数据集上表现的分类性能差的问题,提出一种新的过采样方法:SCSMOTE(Seed Center Synthetic Minority Over-sampling Technique)算法。该算法的关键是在数据集的少数类样本中找出合适的候选样本,计算出候选样本的中心,在候选样本与样本中心之间产生新的少数类样本,实现了对合成少数类样本质量的控制。结合SCSMOTE算法与随机森林算法来处理非平衡数据集,通过在UCI数据集上对比实验结果表明,该算法有效提高了随机森林在非平衡数据集上的分类性能。There are often imbalanced datasets in disaster weather, fault diagnosis, network attacks and financial fraud. In view of the poor classification performance of random forest algorithm on imbalanced datasets, this paper proposed a new oversampling method: SCSMOTE(Seed Center Synthetic Minority Over-sampling Technique). The key of the algorithm is to find appropriate candidate samples from the minority samples of the dataset. Then we calculated the center of the candidate samples, produced new minority samples between the candidate samples and the sample center, and realized the control of the quality of synthesis the minority class samples. SCSMOTE algorithm and random forest algorithm were combined to deal with imbalance datasets. The experimental results on UCI data sets show that the algorithm effectively improves the classification performance of random forest on imbalanced datasets.
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145