面向不平衡数据集分类的LDBSMOTE过采样方法  被引量:7

LDBSMOTE Oversampling Method for Imbalanced Data Sets Classification

在线阅读下载全文

作  者:王泳欣 张大斌 车大庆 吕建秋 Wang Yongxin;Zhang Dabin;Che Daqing;Lyu Jianqiu(College of Mathematics and Informatics,South China Agricultural University;Guangdong Academy of Science and Technology Management and Planning,Guangzhou 510642,China)

机构地区:[1]华南农业大学数学与信息学院 [2]广东省科技管理与规划研究院,广州510642

出  处:《统计与决策》2022年第18期58-63,共6页Statistics & Decision

基  金:国家自然科学基金面上项目(71971089)。

摘  要:文章针对传统SMOTE及BSMOTE过采样方法会导致多数类样本识别率下降的问题,提出基于局部密度的改进BSMOTE算法(LDBSMOTE)。首先,根据样本分布特点计算局部密度值并筛选根样本,最大限度地保证具有潜在价值的样本不会被丢失,然后通过SMOTE合成样本,最后利用集成学习算法进行分类。为了验证LDBSMOTE的有效性对15个公共数据集进行实验,结果表明,相比SMOTE和BSMOTE,LDBSMOTE算法在F1、G-mean及AUC上平均提升了2.25%,且平均得分均为最高,能在保证多数类样本识别率的基础上提升少数类样本的识别率,有效提升分类性能。Aiming at the problem that traditional SMOTE and BSMOTE oversampling methods cause the recognition rate of majority samples to decrease,this paper proposes an improved BSMOTE algorithm based on local density(LDBSMOTE).Firstly,according to the characteristics of sample distribution,the local density value is calculated and root samples are screened to maximize the guarantee that samples with potential value will not be lost.Then SMOTE is adopted to synthesize the sample.Finally,ensemble learning algorithm is used for classification.In order to verify the effectiveness of the LDBSMOTE,experiments are conducted on 15 public data sets.The results show that compared with the SMOTE and BSMOTE,the LDBSMOTE algorithm has an average increase of 2.25% in F1,G-meanand AUC,and the average score is the highest,which can improve the recognition rate of minority samples on the basis of ensuring the recognition rate of the majority samples,and effectively improve the classification performance.

关 键 词:不平衡数据集 局部密度 SMOTE 集成学习 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象