基于自适应随机梯度下降方法的非平衡数据分类  被引量:3

Adaptive Stochastic Gradient Descent for Imbalanced Data Classification

在线阅读下载全文

作  者:陶秉墨 鲁淑霞[1,2] TAO Bing -mo1, LU Shu- xia1,2(1College of Matheniatics and Information Science, Hebei University,Baoding, Hebei 071002, China;2Hebei Province Key Laboratory of Machine Learning and Computational Intelligence, Baoding, Hebei 071002, Chin)

机构地区:[1]河北大学数学与信息科学学院,河北保定071002 [2]河北省机器学习与计算智能重点实验室,河北保定071002

出  处:《计算机科学》2018年第B06期487-492,共6页Computer Science

基  金:河北省自然科学基金(F2015201185)资助

摘  要:对于不平衡数据分类问题,传统的随机梯度下降方法在求解一般的支持向量机问题时会产生一定的偏差,导致效果较差。自适应随机梯度下降算法定义了一个分布p,在选择样例进行迭代更新时,其依据分布p而非依据均匀分布来选择样例,并且在优化问题中使用光滑绞链损失函数。对于不平衡的训练集,依据均匀分布选择样例时,数据的不平衡比率越大,多数类中的样例被选择的次数就越多,从而导致结果偏向少数类。分布p在很大程度上解决了这个问题。普通的随机梯度下降算法没有明确的停机准则,这导致何时停机成为一个很重要的问题,尤其是在大型数据集上进行训练时。以训练集或训练集的子集中的分类准确率为标准来设定停机准则,如果参数设定恰当,算法几乎可以在迭代的早期就停止,这种现象在大中型数据集上表现得尤为突出。在一些不平衡数据集上的实验证明了所提算法的有效性。For imbalanced data classification,the performance of using traditional stochastic gradient descent for solving SVM problems is not very well.Adaptive stochastic gradient descent algorithm defines a distribution p instead of using uniform distribution to choose examples,and the smoothing hinge loss function is used in the optimization problem.Because of the training sets are imbalanced,using uniform distribution will cause the algorithm choose more majority class based on the imbalanced ratio.That would result the classifier bias towards the minority class.The distribution p largely overcomes this issue.When to stop the programs becomes an important problem,because the normal stochastic gradient descent algorithm does not have a stop criterion especially for large data sets.The stop criterion was setted according to the classification accuracy on the training sets or its subsets.This stop criterion could stop the programs very early especially for large data sets if the parameters are chosen properly.Some experiments on imbalanced data sets show that the proposed algorithm is effective.

关 键 词:随机梯度下降 非均匀分布 停机准则 支持向量机 损失函数 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象