检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:曹兰 CAO Lan(Electronic Engineering Department,Zhangzhou Institute of Technology,Zhangzhou 363000,China)
机构地区:[1]漳州职业技术学院电子工程学院,福建漳州363000
出 处:《四川轻化工大学学报(自然科学版)》2021年第6期85-91,共7页Journal of Sichuan University of Science & Engineering(Natural Science Edition)
基 金:福建省中青年教师教育科研项目(科技类JAT191419)。
摘 要:多类不平衡数据的过抽样分类方法有助于解决多类实例平衡及提高分类准确率,但在过抽样生成合成实例过程中也面临着两个主要难题:一是怎样区分每个少数类中的有限实例在生成合成实例时的重要性,二是在生成合成实例后能否更加清楚地划分多数类与少数类的边界。针对此问题,提出了一种增强多类不平衡中少数类实例边界实例方法。其思路是根据少数类实例中边界实例在分类中的重要作用,越靠近边界的少数类实例赋予的权重越大,这样就可在边界处生成更多合成少数类实例,从而达到进一步加强少数类处边界的效果,同时也克服了多数类实例的学习偏差,最终使得多类平衡数据达到一定程度的平衡。实验结果表明,本算法既能很好地区分每个少数类实例在生成合成实例时的重要程度,还能更加清楚地区分多数类与少数类的边界,在不平衡数据分类的4个常用评价指标上,其查准率、查全率、F-Measure和G-mean均获得了较好的效果。The over-sampling classification methods of multi-class unbalanced datasets are helpful in solving the balance of multi-class instances and improving the classification accuracy.However,the process of over-sampling to generating synthetic instances also faces two main problems.One is how to distinguish the importance of each minority class instances on generating synthetic instances.Another is whether the boundary between the majority class and the minority class can be more clearly divided right the synthetic instances are generated.In response to this problem,this algorithm proposes a method to enhance the boundary of minority instances in the multi-class imbalance(MEBMI).The idea is that boundary instances in the minority instances play the important role of the classification.Minority instances closer to the boundary are given more weight.As a result,the more synthetic minority instances can be produced at the boundary,and it will achieve the effect of further strengthening at the boundary of minority class.In the same time,the learning bias of the majority instances can be overcome,and finally the multi-class balanced datasets can come over a certain degree of balance.The experimental results show th at this algorithm can distinguish the importance of each minority class instance on the generation of synthetic instance,and can distinguish the boundary between the majority class and the minority class more clearly.Evaluation metric which are precision rate,recall rate,F-measure and G-mean achieve good results at imbalanced datasets classification.
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.82.212