A novel overlapping minimization SMOTE algorithm for imbalanced classification  

在线阅读下载全文

作  者:Yulin HE Xuan LU Philippe FOURNIER-VIGER Joshua Zhexue HUANG 

机构地区:[1]Guangdong Laboratory of Artificial Inteligence and Digital Economy(SZ),Shenzhen 518107,China [2]College of Computer Science and Software Engineering,Shenzhen University,Shenzhen 518060,China

出  处:《Frontiers of Information Technology & Electronic Engineering》2024年第9期1266-1281,共16页信息与电子工程前沿(英文版)

基  金:Project supported by the National Natural Science Foundation of China(No.61972261);the Natural Science Foundation of Guangdong Province,China(No.2023A1515011667);the Key Basic Research Foundation of Shenzhen,China(No.JCYJ20220818100205012);the Basic Research Foundation of Shenzhen,China(No.JCYJ20210324093609026)。

摘  要:The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of minority-class sample points in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented minorityclass sample point generation algorithm, named overlapping minimization SMOTE(OM-SMOTE). This algorithm is designed specifically for binary imbalanced classification problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated minority-class sample point imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic minority-class sample points leads to better classifier training performances for the naive Bayes,support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for imbalanced classification. The implementation of OM-SMOTE is shared publicly on the Git Hub platform at https://github.com/luxuan123123/OM-SMOTE/.

关 键 词:Imbalanced classification Synthetic minority oversampling technique(SMOTE) Majority-class sample point Minority-class sample point Generalization capability Overlapping minimization 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象