基于冗余数据消除的不平衡样本加权支持向量机方法研究  被引量:4

A WEIGHTED SUPPORT VECTOR MACHINES FOR UNBALANCED DATA SET WITH REDUNDANT DATA REMOVING

在线阅读下载全文

作  者:高文昀 戴胜 涂丽萍 张叶 GAO Wenyun;DAI Sheng;TU Liping;ZHANG Ye(Nanjing Les Information Technology Co.,Ltd,Nanjing 210014,Jiangsu,China;North Information Control Research Academy Group Co.,Ltd.,Nanjing 211100,Jiangsu,China)

机构地区:[1]南京莱斯信息技术股份有限公司,江苏南京210014 [2]北方信息控制研究院集团有限公司,江苏南京211100

出  处:《长江信息通信》2022年第1期46-50,共5页Changjiang Information & Communications

基  金:国家重点研发计划(2020YFC1511800)。

摘  要:现有支持向量机对于训练样本过多或训练样本中类的数量不平衡,存在训练花费时间过长和得到的分类面偏离最优分类面使得样本错分等问题。为此文章提出一种基于冗余数据消除的不平衡样本加权支持向量机方法。该方法使用费歇尔判别率准则去除训练样本集中那些对最终的分类面训练没有帮助的样本,即冗余数据,并依据训练样本对模糊分类面的贡献程度引入样本加权策略实现为不同的训练样本赋予权重。实验结果表明,该方法与传统的支持向量机相比,大大缩短了不平衡大样本数据上支持向量机的训练时间,以及减少了因数据集中样本不平衡而引起的预测样本被错分,使得支持向量机的分类性能得到了提升。It has problems in current Support Vector Machines(SVM)that too long training time is needed and misclassified samples are found which are caused by theresulting classification surface deviating from the optimal classification surface, when too many training samples or unbalanced training samples are fed in SVM. In this paper a weighted support vector machine based on unbalanced samples for redundant data removal is proposed. The method uses the Fisher Discriminant Ratio(FDR) to remove the training samples which are unhelpful for training the SVM and called redundant data, and introduces a sample weighting strategy to obtain weights from training samples according to their contribution to the fuzzy hyperplane. Simulation results show that compared to the standard SVM training, the proposed method improves the classification performance for large and unbalanced samples with shortening training time greatly and reducing the misclassification rate of testing samples.

关 键 词:支持向量机 最优分类面 冗余数据 样本加权 样本不平衡 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象