基于安全样本筛选的不平衡数据抽样方法  被引量:6

Safe Sample Screening Based Sampling Method for Imbalanced Data

在线阅读下载全文

作  者:石洪波[1] 刘焱昕 冀素琴[1] SHI Hongbo;LIU Yanxin;JI Suqin(College of Information, Shanxi University of Finance and Eco鄄 nomics, Taiyuan 030006)

机构地区:[1]山西财经大学信息学院

出  处:《模式识别与人工智能》2019年第6期545-556,共12页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金项目(No.61801279);山西省自然科学基金项目(No.2014011022-2,201801D121115)资助~~

摘  要:针对欠抽样可能导致有用信息的丢失,以及合成小类的过抽样技术(SMOTE)可能使大类和小类间类重叠更严重的问题,文中提出基于安全样本筛选的欠抽样和 SMOTE 结合的抽样方法(Screening_SMOTE).利用安全筛选规则,识别并丢弃大类中部分对确定决策边界无价值的实例和噪音实例,采用 SMOTE 对筛选后数据集进行过抽样.基于安全样本筛选的欠抽样既避免原始数据中有价值信息的丢失,又丢弃大类中的噪音实例,缓减过抽样数据集类重叠的问题.实验表明在处理不平衡数据集,特别是维数较高的不平衡数据集时 Screening_SMOTE 的有效性.The loss of valuable information may be caused by undersampling, and the class overlapping between the majority class and the minority class may be aggravated by the synthetic minority oversampling technique(SMOTE). A sampling method, Screening_SMOTE, is proposed in this paper, combining safe sample screening based undersampling with SMOTE. Parts of non-informative instances and noise instances in the majority class are identified and discarded by the undersampling method using safe screening rules. Then, the minority class instances generated by SMOTE are added into the screened dataset. The loss of informative information is avoided and the noise instances in the majority class are discarded using safe sample screening based undersampling, relieving the class overlapping. The experimental results show that Screening _ SMOTE is an effective method of rebalancing imbalanced datasets, especially for high dimensional imbalanced datasets.

关 键 词:不平衡数据 安全样本筛选 欠抽样 不平衡比率 合成小类的过抽样技术(SMOTE) 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象