改进的邻近加权合成过采样技术

Improved proximity weighted synthetic oversampling technique

作　　者：邢胜王晓兰[2] 沈家星朱美玲曹永青何玉林 XING Sheng;WANG Xiaolan;SHEN Jiaxing;ZHU Meiling;CAO Yongqing;HE Yulin(College of Computer Science and Engineering,Cangzhou Normal University,Cangzhou 061001,Hebei Province,P.R.China;Department of Information Engineering,Cangzhou Technical College,Cangzhou 061001,Hebei Province,P.R.China;Department of Computing and Decision Sciences,Lingnan University,Hong Kong,P.R.China;Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ),Shenzhen 518107,Guangdong Province,P.R.China)

机构地区：[1]沧州师范学院计算机科学与工程学院,河北沧州061001 [2]沧州职业技术学院信息工程系,河北沧州061001 [3]岭南大学电脑与决策科学学系,中国香港 [4]人工智能与数字经济广东省实验室(深圳),广东深圳518107

出　　处：《深圳大学学报（理工版）》2024年第6期748-755,共8页Journal of Shenzhen University(Science and Engineering)

基　　金：河北省高等学校科学研究资助项目(ZC2022071);沧州师范学院校内科研基金资助项目(xnjjl1904);广东省自然科学基金资助项目(2023A1515011667);广东省基础与应用基础研究基金资助项目(2023B1515120020)。

摘　　要：针对邻近加权合成过采样技术(proximity weighted synthetic oversampling technique,ProWSyn)在合成样例时未删除噪声样例,且当平滑因子在[0,1]区间取值时,权重比例难以覆盖整个搜索空间的缺陷,提出一种改进的邻近加权合成过采样技术(improved proximity weighted synthetic oversampling technique,IProWSyn).改变权重的计算策略,引入底数为(0,1]的普通指数函数,通过动态改变底数令权重覆盖更大范围的搜索空间,进而找到更优的权重.将IProWSyn、ASN-SMOTE和ProWSyn应用在非平衡数据集ada、ecoli1、glass1、haberman、Pima和yeast1上,再使用k近邻(k-nearest neighbors,kNN)分类器和神经网络分类器检验方法的有效性.实验结果表明,在多数数据集上IProWSyn的F1、几何平均值(geometric mean,G-mean)和曲线下面积(area under curve,AUC)指标性能都高于其他过采样方法.IProWSyn过采样技术在这些数据集的综合分类效果更好,有更好的泛化表现.An improved proximity weighted synthetic oversampling technique(IProWSyn)is proposed to address the limitations of the proximity weighted synthetic oversampling technique(ProWSyn),namely its inability to remove noise samples during the generation of synthetic samples and the difficulty in covering the entire search space when the smoothing factor is within the range of 0 to 1.In order to find better weights,a common exponential function with a base ranging from 0 to 1 is introduced,allowing the weights to dynamically adjust and cover a larger range of the search space.By applying IProWSyn,ASN-SMOTE,and ProWSyn oversampling methods to six imbalanced datasets,i.e.,ada,ecoli1,glass1,haberman,Pima,and yeast1,the effectiveness of IProWSyn is verified using k-nearest neighbor(kNN)and neural network classifier.Experimental results show that IProWSyn achieves higher F1 value,geometric mean(G-mean)value and area under curve(AUC)values compared to other oversampling methods on most datasets.These results suggest that IProWSyn enables classifiers to achieve better overall classification and generalization performances on these datasets.

关键词：人工智能非平衡数据邻近加权合成过采样技术过采样方法 K近邻分类器神经网络

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进的邻近加权合成过采样技术

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进的邻近加权合成过采样技术

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索