一种增强少数类边界的多类不平衡过抽样算法  

An Oversampling Algorithm of Enhanced Minority Instance Boundary in Multi-class Imbalanced Datasets

在线阅读下载全文

作  者:曹兰 CAO Lan(Electronic Engineering Department,Zhangzhou Institute of Technology,Zhangzhou 363000,China)

机构地区:[1]漳州职业技术学院电子工程学院,福建漳州363000

出  处:《四川轻化工大学学报(自然科学版)》2021年第6期85-91,共7页Journal of Sichuan University of Science & Engineering(Natural Science Edition)

基  金:福建省中青年教师教育科研项目(科技类JAT191419)。

摘  要:多类不平衡数据的过抽样分类方法有助于解决多类实例平衡及提高分类准确率,但在过抽样生成合成实例过程中也面临着两个主要难题:一是怎样区分每个少数类中的有限实例在生成合成实例时的重要性,二是在生成合成实例后能否更加清楚地划分多数类与少数类的边界。针对此问题,提出了一种增强多类不平衡中少数类实例边界实例方法。其思路是根据少数类实例中边界实例在分类中的重要作用,越靠近边界的少数类实例赋予的权重越大,这样就可在边界处生成更多合成少数类实例,从而达到进一步加强少数类处边界的效果,同时也克服了多数类实例的学习偏差,最终使得多类平衡数据达到一定程度的平衡。实验结果表明,本算法既能很好地区分每个少数类实例在生成合成实例时的重要程度,还能更加清楚地区分多数类与少数类的边界,在不平衡数据分类的4个常用评价指标上,其查准率、查全率、F-Measure和G-mean均获得了较好的效果。The over-sampling classification methods of multi-class unbalanced datasets are helpful in solving the balance of multi-class instances and improving the classification accuracy.However,the process of over-sampling to generating synthetic instances also faces two main problems.One is how to distinguish the importance of each minority class instances on generating synthetic instances.Another is whether the boundary between the majority class and the minority class can be more clearly divided right the synthetic instances are generated.In response to this problem,this algorithm proposes a method to enhance the boundary of minority instances in the multi-class imbalance(MEBMI).The idea is that boundary instances in the minority instances play the important role of the classification.Minority instances closer to the boundary are given more weight.As a result,the more synthetic minority instances can be produced at the boundary,and it will achieve the effect of further strengthening at the boundary of minority class.In the same time,the learning bias of the majority instances can be overcome,and finally the multi-class balanced datasets can come over a certain degree of balance.The experimental results show th at this algorithm can distinguish the importance of each minority class instance on the generation of synthetic instance,and can distinguish the boundary between the majority class and the minority class more clearly.Evaluation metric which are precision rate,recall rate,F-measure and G-mean achieve good results at imbalanced datasets classification.

关 键 词:数据挖掘 过抽样 分类 评价指标 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象