MCBC-SMOTE:A Majority Clustering Model for Classification of Imbalanced Data  

在线阅读下载全文

作  者:Jyoti Arora Meena Tushir Keshav Sharma Lalit Mohan Aman Singh Abdullah Alharbi Wael Alosaimi 

机构地区:[1]Department of Information Technology,MSIT,GGSIPU,New Delhi,110058,India [2]Department of Electrical and Electronic Engineering,MSIT,GGSIPU,New Delhi,110058,India [3]School of Computer Science and Engineering,Lovely Professional University,144411,Punjab,India [4]Department of Information Technology,College of Computers and Information Technology,Taif University,11099,Taif 21944,Saudi Arabia

出  处:《Computers, Materials & Continua》2022年第12期4801-4817,共17页计算机、材料和连续体(英文)

基  金:This research was supported by Taif University Researchers Supporting Project number(TURSP-2020/254),Taif University,Taif,Saudi Arabia.

摘  要:Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.

关 键 词:Imbalance class problem CLASSIFICATION SMOTE K-MEANS CLUSTERING sampling 

分 类 号:O17[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象