检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:石洪波[1] 刘焱昕 冀素琴[1] SHI Hongbo;LIU Yanxin;JI Suqin(College of Information, Shanxi University of Finance and Eco鄄 nomics, Taiyuan 030006)
机构地区:[1]山西财经大学信息学院
出 处:《模式识别与人工智能》2019年第6期545-556,共12页Pattern Recognition and Artificial Intelligence
基 金:国家自然科学基金项目(No.61801279);山西省自然科学基金项目(No.2014011022-2,201801D121115)资助~~
摘 要:针对欠抽样可能导致有用信息的丢失,以及合成小类的过抽样技术(SMOTE)可能使大类和小类间类重叠更严重的问题,文中提出基于安全样本筛选的欠抽样和 SMOTE 结合的抽样方法(Screening_SMOTE).利用安全筛选规则,识别并丢弃大类中部分对确定决策边界无价值的实例和噪音实例,采用 SMOTE 对筛选后数据集进行过抽样.基于安全样本筛选的欠抽样既避免原始数据中有价值信息的丢失,又丢弃大类中的噪音实例,缓减过抽样数据集类重叠的问题.实验表明在处理不平衡数据集,特别是维数较高的不平衡数据集时 Screening_SMOTE 的有效性.The loss of valuable information may be caused by undersampling, and the class overlapping between the majority class and the minority class may be aggravated by the synthetic minority oversampling technique(SMOTE). A sampling method, Screening_SMOTE, is proposed in this paper, combining safe sample screening based undersampling with SMOTE. Parts of non-informative instances and noise instances in the majority class are identified and discarded by the undersampling method using safe screening rules. Then, the minority class instances generated by SMOTE are added into the screened dataset. The loss of informative information is avoided and the noise instances in the majority class are discarded using safe sample screening based undersampling, relieving the class overlapping. The experimental results show that Screening _ SMOTE is an effective method of rebalancing imbalanced datasets, especially for high dimensional imbalanced datasets.
关 键 词:不平衡数据 安全样本筛选 欠抽样 不平衡比率 合成小类的过抽样技术(SMOTE)
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229