检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴磊[1] 房斌[1] 刁丽萍[2] 陈静[1] 谢娜娜[1]
机构地区:[1]重庆大学计算机学院,重庆400030 [2]第三军医大学新桥医院健康管理科,重庆400037
出 处:《计算机工程与应用》2013年第21期172-176,185,共6页Computer Engineering and Applications
基 金:中央高校基本科研业务费资助(No.CDJXS10182216)
摘 要:在机器学习领域的研究当中,分类器的性能会受到许多方面的影响,其中训练数据的不平衡对分类器的影响尤为严重。训练数据的不平衡也就是指在提供的训练数据集中,一类的样本总数远多于另一类的样本总数。常用的不平衡数据的处理方法有很多,只探讨利用重抽样方法对不平衡数据进行预处理来提高分类效果的方法。数据抽样算法有很多,但可以归为两大类:过抽样和欠抽样。针对二分类问题提出了四种融合过抽样和欠抽样算法的重抽样方法:BSM+Tomek、BSM+ENN、CBOS+Tomek和CBOS+ENN,并且与另外十种经典的重抽样算法做了大量的对比实验,实验证明提出的四种预处理算法在多种评价指标下提高了不平衡数据的分类效果。There are several aspects that might influence the performance achieved by existing learning systems in the area of machine learning. It has been reported that one of these aspects is related to class imbalance in which examples in training data belonging to one class heavily outnumber the examples in the other class. Though there are several kinds of methods to get rid of this problem, this paper only discusses using resampling method to balance data in the period of preprocessing to improve the effect of classification. There are two kinds of resampling methods: over resampling and under resampling. In this paper, four methods which combine oversampling and under-sampling method are proposed for binary classification: BSM + Tomek, BSM +ENN, CBOS+Tomek and CBOS+ENN, and present very good results for data sets with a small number of positive examples. Moreover, ten other resampling methods are also taken to make comparative experiments with the four methods proposed by this paper, and the four methods also present very good results.
关 键 词:不平衡数据 重抽样 基于聚类的过抽样算法(CBOS) 基于边界值的虚拟少数类向上采样算法(BSM) 可选择最 近邻算法(ENN) Tomek LINKS 预处理
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222