An Effective Over-sampling Method for Imbalanced Data Sets Classification  被引量:6

An Effective Over-sampling Method for Imbalanced Data Sets Classification

在线阅读下载全文

作  者:ZHAI Yun MA Nan RUAN Da AN Bing 

机构地区:[1].School of Information Engineering, University of Science and Technology Beijing, Beijing 100083, China [2]School of Computer Science, Liaocheng University, Liaocheng 252059, China [3]Information College, Beijing Union University, Beijing 100101, China [4]Department of Applied Mathematics and Computer Science, Ghent University, 9000 Ghent, Belgium [5]Belgian Nuclear Research Centre (SCK*CEN), 2400 Mol, Belgium

出  处:《Chinese Journal of Electronics》2011年第3期489-494,共6页电子学报(英文版)

基  金:This work is supported in part by the National Natural Science Foundation of China (No.60675030, No.60875029), Funding Project for Academic Human Resources Development (No.PHR(IHLB) 2010).

摘  要:Imbalanced data sets in real-world applications have a majority class with normal instances and a minority class with abnormal or important instances. Learning from such data sets usually generates biased classifiers that have a higher predictive accuracy over the majority class,but a rather poorer predictive accuracy over the minority class. The Synthetic minority over-sampling technique (SMOTE) is specifically designed for learning from imbalaneed data sets. This paper presents a novel approach for learning from imbalanced data sets, based on an improved SMOTE algorithm. The approach deals with noise data by a hierarchical filtering mechanism, employs a selection strategy of the minority instances and makes full use of dynamic distribution density of the minority followed by the SMOTE process. This empirical analysis of the approach showed quantitatively competitive with SMOTE and series of its improved algorithm in terms of the receiver operating characteristic curve when applied to several highly and moderately imbalanced data sets.

关 键 词:Data mining CLASSIFICATION Imbalanceddata sets Selection strategy Distribution density Oversample. 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] TH113.25[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象