MetaCost与重采样结合的不平衡分类算法——RS-MetaCost  被引量:1

An Imbalanced Classification Algorithm Combining MetaCost and Resampling—RS-MetaCost

在线阅读下载全文

作  者:邹春安 王嘉宝 付光辉[1] ZOU Chun-an;WANG Jia-bao;FU Guang-hui(School of Science,Kunming University of Science and Technology,Kunming 650500,China)

机构地区:[1]昆明理工大学理学院,云南昆明650500

出  处:《软件导刊》2022年第3期34-41,共8页Software Guide

基  金:国家自然科学基金项目(11761041)。

摘  要:不平衡分类是当今机器学习中的研究热点与难点。为提高不平衡数据的分类效果,提出MetaCost与重采样结合的不平衡分类算法——RS-MetaCost。首先在MetaCost划分子集前对不平衡数据集进行重采样,即过采样少数类或欠采样多数类,以降低或消除数据不平衡程度;其次在预测概率阶段,利用m-estimation提高少数类预测概率。采用6组模拟数据集与10组实例数据集,将RS-MetaCost与经典算法进行比较实验。结果表明,在大多数数据集上,RS-MetaCost在保证整体分类精度很高的前提下,还能提高少数类的分类精度,且过采样下的RS-MetaCost优于欠采样下的RS-MetaCost。The imbalanced classification is a hot and difficult topic in machine learning nowadays.In order to improve the classification effect of imbalanced datasets,propose an imbalanced classification algorithm which combines resampling methods and MetaCost—RS-MetaCost.First,resampling the imbalanced datasets before MetaCost subsets,that is,over sampling minority classes or under sampling majority classes,to reduce or eliminate the degree of data imbalance.Secondly,in the stage of prediction probability,m-estimation is used to increase the prediction probability of minority class,which increase the prediction probability of minority class.RSMetaCost is compared with classical algorithms with 6 simulated datasets and 10 real-world datasets.The results show that RS-MetaCost can improve the classification accuracy of a few classes under the premise of ensuring the overall classification accuracy is very high on the most of imbalanced datasets.Furthermore,the over-sampled RS-MetaCost is better than the under-sampled RS-MetaCost.

关 键 词:不平衡分类 MetaCost 重采样 M-ESTIMATION 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象