基于边界信息的自适应过采样算法  

Adaptive Sampling Algorithm Based on Border Information

在线阅读下载全文

作  者:杜睿山[1,2] 靳明洋 孟令东[2] 宋健辉 DU Ruishan;JIN Mingyang;MENG Lingdong;SONG Jianhui(Department of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China;Key Laboratory of Oil and Gas Reservoir and Underground Gas Storage Integrity Evaluation,Northeast Petroleum University,Daqing 163318,China)

机构地区:[1]东北石油大学计算机与信息技术学院,黑龙江大庆163318 [2]东北石油大学黑龙江油气藏及地下储库完整性评价重点实验室,黑龙江大庆163318

出  处:《郑州大学学报(理学版)》2025年第1期23-30,共8页Journal of Zhengzhou University:Natural Science Edition

基  金:黑龙江省自然科学基金项目(LH2021F004)。

摘  要:针对人工少数类过采样(synthetic minority over-sampling technique,SMOTE)算法存在样本合成区域狭小,容易将少数类泛化到多数类及引入噪声的问题,提出一种基于噪声过滤、边界点自适应采样的过采样算法。首先,该算法使用K近邻算法进行噪声过滤,接着确定边界点并在边界点中寻找合适的点作为根样本点,并以其K近邻点中与其同类且欧氏距离最远的点作为候选样本点。然后,根据根样本点所携带的边界信息确定该点所合成的样本数量,并根据根样本点和候选样本点生成一个N维球体作为样本的合成区间。最后,对合成样本进行判断以确定其是否满足条件。通过实验证明,该算法生成的样本质量要优于SMOTE及其常见变种算法。In order to address the issues of limited synthetic region,potential generalization of minority class to majority class,and introduction of noise in the synthetic minority over-sampling technique(SMOTE)algorithm,a oversampling method based on noise-filtering and boundary-point adaptive sampling was proposed.Firstly,the K-nearest neighbors algorithm was utilized for noise filtering.Next,the boundary points were identified and appropriate points among them were selected as root samples,with the candidate samples being chosen as the farthest points in the K-nearest neighbors of the same class with the root samples based on the Euclidean distance.Subsequently,the number of synthetic samples to be generated for each root sample was determined based on the boundary information carried by the root samples.An N-dimensional sphere was created using the root samples and the candidate samples as the synthesis interval for the samples.Finally,the synthesized samples were assessed to ensure their compliance with the conditions.Experimental results demonstrated that the proposed method yielded samples with higher quality compared to SMOTE and its common variants.

关 键 词:SMOTE KNN 过采样算法 数据不均衡 ISMOTE 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象