检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杜睿山[1,2] 靳明洋 孟令东[2] 宋健辉 DU Ruishan;JIN Mingyang;MENG Lingdong;SONG Jianhui(Department of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China;Key Laboratory of Oil and Gas Reservoir and Underground Gas Storage Integrity Evaluation,Northeast Petroleum University,Daqing 163318,China)
机构地区:[1]东北石油大学计算机与信息技术学院,黑龙江大庆163318 [2]东北石油大学黑龙江油气藏及地下储库完整性评价重点实验室,黑龙江大庆163318
出 处:《郑州大学学报(理学版)》2025年第1期23-30,共8页Journal of Zhengzhou University:Natural Science Edition
基 金:黑龙江省自然科学基金项目(LH2021F004)。
摘 要:针对人工少数类过采样(synthetic minority over-sampling technique,SMOTE)算法存在样本合成区域狭小,容易将少数类泛化到多数类及引入噪声的问题,提出一种基于噪声过滤、边界点自适应采样的过采样算法。首先,该算法使用K近邻算法进行噪声过滤,接着确定边界点并在边界点中寻找合适的点作为根样本点,并以其K近邻点中与其同类且欧氏距离最远的点作为候选样本点。然后,根据根样本点所携带的边界信息确定该点所合成的样本数量,并根据根样本点和候选样本点生成一个N维球体作为样本的合成区间。最后,对合成样本进行判断以确定其是否满足条件。通过实验证明,该算法生成的样本质量要优于SMOTE及其常见变种算法。In order to address the issues of limited synthetic region,potential generalization of minority class to majority class,and introduction of noise in the synthetic minority over-sampling technique(SMOTE)algorithm,a oversampling method based on noise-filtering and boundary-point adaptive sampling was proposed.Firstly,the K-nearest neighbors algorithm was utilized for noise filtering.Next,the boundary points were identified and appropriate points among them were selected as root samples,with the candidate samples being chosen as the farthest points in the K-nearest neighbors of the same class with the root samples based on the Euclidean distance.Subsequently,the number of synthetic samples to be generated for each root sample was determined based on the boundary information carried by the root samples.An N-dimensional sphere was created using the root samples and the candidate samples as the synthesis interval for the samples.Finally,the synthesized samples were assessed to ensure their compliance with the conditions.Experimental results demonstrated that the proposed method yielded samples with higher quality compared to SMOTE and its common variants.
关 键 词:SMOTE KNN 过采样算法 数据不均衡 ISMOTE
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.170