检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邢胜 王晓兰[2] 沈家星 朱美玲 曹永青 何玉林 XING Sheng;WANG Xiaolan;SHEN Jiaxing;ZHU Meiling;CAO Yongqing;HE Yulin(College of Computer Science and Engineering,Cangzhou Normal University,Cangzhou 061001,Hebei Province,P.R.China;Department of Information Engineering,Cangzhou Technical College,Cangzhou 061001,Hebei Province,P.R.China;Department of Computing and Decision Sciences,Lingnan University,Hong Kong,P.R.China;Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ),Shenzhen 518107,Guangdong Province,P.R.China)
机构地区:[1]沧州师范学院计算机科学与工程学院,河北沧州061001 [2]沧州职业技术学院信息工程系,河北沧州061001 [3]岭南大学电脑与决策科学学系,中国香港 [4]人工智能与数字经济广东省实验室(深圳),广东深圳518107
出 处:《深圳大学学报(理工版)》2024年第6期748-755,共8页Journal of Shenzhen University(Science and Engineering)
基 金:河北省高等学校科学研究资助项目(ZC2022071);沧州师范学院校内科研基金资助项目(xnjjl1904);广东省自然科学基金资助项目(2023A1515011667);广东省基础与应用基础研究基金资助项目(2023B1515120020)。
摘 要:针对邻近加权合成过采样技术(proximity weighted synthetic oversampling technique,ProWSyn)在合成样例时未删除噪声样例,且当平滑因子在[0,1]区间取值时,权重比例难以覆盖整个搜索空间的缺陷,提出一种改进的邻近加权合成过采样技术(improved proximity weighted synthetic oversampling technique,IProWSyn).改变权重的计算策略,引入底数为(0,1]的普通指数函数,通过动态改变底数令权重覆盖更大范围的搜索空间,进而找到更优的权重.将IProWSyn、ASN-SMOTE和ProWSyn应用在非平衡数据集ada、ecoli1、glass1、haberman、Pima和yeast1上,再使用k近邻(k-nearest neighbors,kNN)分类器和神经网络分类器检验方法的有效性.实验结果表明,在多数数据集上IProWSyn的F1、几何平均值(geometric mean,G-mean)和曲线下面积(area under curve,AUC)指标性能都高于其他过采样方法.IProWSyn过采样技术在这些数据集的综合分类效果更好,有更好的泛化表现.An improved proximity weighted synthetic oversampling technique(IProWSyn)is proposed to address the limitations of the proximity weighted synthetic oversampling technique(ProWSyn),namely its inability to remove noise samples during the generation of synthetic samples and the difficulty in covering the entire search space when the smoothing factor is within the range of 0 to 1.In order to find better weights,a common exponential function with a base ranging from 0 to 1 is introduced,allowing the weights to dynamically adjust and cover a larger range of the search space.By applying IProWSyn,ASN-SMOTE,and ProWSyn oversampling methods to six imbalanced datasets,i.e.,ada,ecoli1,glass1,haberman,Pima,and yeast1,the effectiveness of IProWSyn is verified using k-nearest neighbor(kNN)and neural network classifier.Experimental results show that IProWSyn achieves higher F1 value,geometric mean(G-mean)value and area under curve(AUC)values compared to other oversampling methods on most datasets.These results suggest that IProWSyn enables classifiers to achieve better overall classification and generalization performances on these datasets.
关 键 词:人工智能 非平衡数据 邻近加权合成过采样技术 过采样方法 K近邻分类器 神经网络
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.249.37