基于CPD-SMOTE的类不平衡数据分类算法研究  被引量:7

CLASS IMBALANCE DATA CLASSIFICATION ALGORITHM BASED ON CPD-SMOTE

在线阅读下载全文

作  者:彭如香[1,2] 杨涛 孔华锋[1,2] 姜国庆 凡友荣[1,2] Peng Ruxiang;Yang Tao;Kong Huafeng;Jiang Guoqing;Fan Yourong(Third Research Institute of Ministry of Public Security,Shanghai 210204,China;Key Lab of Information Network Security,Shanghai 201204,China)

机构地区:[1]公安部第三研究所,上海201204 [2]信息网络安全公安部重点实验室,上海201204

出  处:《计算机应用与软件》2018年第12期259-262,268,共5页Computer Applications and Software

基  金:国家重点研发计划课题(2016YFC0800909);公安部科技强警基础工作专项项目(2018GBJC19);上海市科委科研项目(17DZ1101004)

摘  要:类不平衡现象普遍存在于不同应用领域中,如金融欺诈、网络入侵、垃圾邮件过滤、医学检测,直接采用传统的学习分类算法,分类准确率较低。针对类不平衡情况对分类器的影响,基于传统过采样算法SMOTE(Synthetic Minority Oversampling Technique)算法处理类不平衡的有效性,致力进一步提升SMOTE算法性能,提出一种面向类不平衡数据集分类的改进型SMOTE算法——CPD-SMOTE算法。通过考虑训练集小样本的特征、位置及其周围样本分布,来确定小样本的强相关邻居集,以此作为SMOTE最近邻居集,产生新的小样本。实验结果表明,CPD-SMOTE算法在处理不平衡数据集上相比SMOTE、Borderline-SMOTE、ADASYN、LN-SMOTE等算法有所提高。Class imbalance is a common phenomenon existing in different applications, such as financial fraud, network intrusion, spam filtering and medical detection. If we directly adopt the traditional learning classification algorithm, classification accuracy is low. Aiming at the effect of class imbalance on classifier, this paper proposed an improved SMOTE algorithm, CPD-SMOTE algorithm, which was oriented to the classification of class imbalance datasets. Based on the effectiveness of traditional over-sampling algorithm SMOTE to deal with class imbalance, CPD-SMOTE algorithm was engaged in further improving the performance of SMOTE algorithm. CPD-SMOTE algorithm determined the strong correlation neighborhood set of small samples by considering the characteristics and location of small samples and distribution of their surrounding samples in the training set. It was used as the nearest neighbor set of SMOTE to generate new small samples. Experimental results show that CPD-SMOTE algorithm is better than SMOTE, Borderline-SMOTE, ADASYN and LN-SMOTE in dealing with imbalanced datasets.

关 键 词:SMOTE 类不平衡 分类算法 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象