检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:鲁淑霞[1,2] 张振莲 LU Shu-xia;ZHANG Zhen-lian(College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China;Hebei Province Key Laboratory of Machine Learning and Computational Intelligence,Baoding,Hebei 071002,China)
机构地区:[1]河北大学数学与信息科学学院,河北保定071002 [2]河北省机器学习与计算智能重点实验室,河北保定071002
出 处:《计算机科学》2021年第11期184-191,共8页Computer Science
基 金:国家自然科学基金项目(61672205);河北省科技计划重点研发项目(19210310D)。
摘 要:为了解决非平衡数据分类问题,提出了一种基于最优间隔的AdaBoost v算法。该算法采用改进的SVM作为基分类器,在SVM的优化模型中引入间隔均值项,并根据数据非平衡比对间隔均值项和损失函数项进行加权;采用带有方差减小的随机梯度方法(Stochastic Variance Reduced Gradient,SVRG)对优化模型进行求解,以加快收敛速度。所提基于最优间隔的AdaBoost v算法在样本权重更新公式中引入了一种新的自适应代价敏感函数,赋予少数类样本、误分类的少数类样本以及靠近决策边界的少数类样本更高的代价值;另外,通过结合新的权重公式以及引入给定精度参数v下的最优间隔的估计值,推导出新的基分类器权重策略,进一步提高了算法的分类精度。对比实验表明,在线性和非线性情况下,所提基于最优间隔的AdaBoost v算法在非平衡数据集上的分类精度优于其他算法,且能获得更大的最小间隔。In order to solve the problem of imbalanced data classification,this paper proposes an AdaBoost v algorithm based on optimal margin.In this algorithm,the improved SVM is used as the base classifier,the margin mean term is introduced into the optimization model of SVM,and the margin mean term and loss function term are weighted by data imbalance ratio.The stochastic variance reduced gradient(SVRG)is used to solve the optimization model to improve the convergence rate.In the optimal margin AdaBoost v algorithm,a new adaptive cost sensitive function is introduced into the instance weight update formula,the minority instances,the misclassified instances and the borderline minority instances are assigned higher cost values.In addition,a new weight strategy of the base classifier is derived by combining the new weight formula and introducing the estimated value of the optimal margin under the given precision parameter v,so as to further improve the classification accuracy of the algorithm.The experimental results show that the classification accuracy of the AdaBoost v algorithm with optimal margin is better than other algorithms on imbalanced datasets in the case of linear and nonlinear,and it can obtain a larger minimum margin.
关 键 词:非平衡数据 SVRG AdaBoost_(v) 最优间隔 自适应代价敏感函数
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.170