基于权重距离的优势边界小类样本合成算法  

A synthetic algorithm of advantaged boundary for minority class samples based on weighted distance

在线阅读下载全文

作  者:何田中[1,2] 郑艺峰 胡敏杰[1,2] HE Tianzhong;ZHENG Yifeng;HU Minjie(Key Laboratory of Data Science and Intelligence Application,Minnan Normal University,Zhangzhou,Fujian 363000,China;School of Computer Science,Minnan Normal University,Zhangzhou,Fujian 363000,China)

机构地区:[1]闽南师范大学数据科学与智能应用福建省高校重点实验室,福建漳州363000 [2]闽南师范大学计算机学院,福建漳州363000

出  处:《闽南师范大学学报(自然科学版)》2024年第1期54-64,共11页Journal of Minnan Normal University:Natural Science

基  金:国家自然科学基金项目(62376114);福建省自然科学基金项目(2021J011003,2021J011004,2021J011006)。

摘  要:提出基于权重距离的优势边界小类样本合成算法(ABWD)来克服数据类别不平衡的问题.ABWD算法具有如下特点:1)定义权重距离,并基于该距离选取样本近邻;2)根据样本近邻确定该样本是否为小类的边界样本;3)对每个小类的边界样本确定其合成位置与合成数量,使该小类样本合成后近邻中小类个数不少于大类的个数,确保该小类样本具有优势边界.实验结果表明,与其他典型过抽样算法相比,算法较大提高了小类的分类性能,在G-mean、F-measure及查全率三种度量上均取得很好的实验结果.A synthetic algorithm of advantaged boundary for minority class samples based on weighted distance is presented to overcome the issue of class imbalance in data set.The ABWD algorithm has three characteristics:first,it defines a weighted distance metric and selects sample neighbors based on this distance.Second,it determines whether a sample belongs to the minority class's boundary based on its proximity to other samples.Finally,it calculates the positions and quantities of synthetic samples for each boundary sample within the minority class,ensuring that the number of minority class samples is not less than that of the majority class in the neighborhood after synthesis.This guarantees an advantaged boundary for the minority class samples.Experimental results demonstrate that the proposed algorithm significantly enhances the classification performance of the minority class when compared to other typical oversampling techniques.Good experimental results are obtained on G-mean,F-measure and recall.

关 键 词:数据挖掘 不平衡数据 过抽样 优势边界 权重距离 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象