不均衡数据集学习中基于初分类的过抽样算法  被引量:11

Over-sampling algorithm based on preliminary classification in imbalanced data sets learning

在线阅读下载全文

作  者:韩慧[1] 王路[1] 温明[1] 王文渊[1] 

机构地区:[1]清华大学自动化系,北京100084

出  处:《计算机应用》2006年第8期1894-1897,共4页journal of Computer Applications

摘  要:为了有效地提高不均衡数据集中少数类的分类性能,提出了基于初分类的过抽样算法。首先,对测试集进行初分类,以尽可能多地保留多数类的有用信息;其次,对于被初分类预测为少数类的样本进行再次分类,以有效地提高少数类的分类性能。使用美国加州大学欧文分校的数据集将基于初分类的过抽样算法与合成少数类过抽样算法、欠抽样方法进行了实验比较。结果表明,基于初分类的过抽样算法的少数类与多数类的分类性能都优于其他两种算法。To significantly improve the classification performance of the minority class, an over-sampling algorithm based on preliminary classification was presented. Firstly, preliminary classification was made on the test data in order to save the useful information of the majority class as much as possible, Then the test data that were predicted to belong to minority class were reclassified to improve the classification performance of the minority class. Using the data sets provided by University of California, Irvine, the new algorithm was compared with synthetic minority over-sampling technique and under-sampling method. The experimental results show that the new algorithm performs better than the others in terms of the classification performance of the minority class and majority class.

关 键 词:不均衡数据集 过抽样 欠抽样 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象