Journal of Frontiers of Computer Science and Technology ›› 2017, Vol. 11 ›› Issue (10): 1662-1671.DOI: 10.3778/j.issn.1673-9418.1609009

Previous Articles     Next Articles

Imbalanced Weighted Stochastic Gradient Descent Online Algorithm for SVM

LU Shuxia+, ZHOU Mi, JIN Zhao   

  1. Hebei Province Key Laboratory of Machine Learning and Computational Intelligence, College of Mathematics and Information Science, Hebei University, Baoding, Hebei 071002, China
  • Online:2017-10-01 Published:2017-10-20


鲁淑霞+,周  谧,金  钊   

  1. 河北大学 数学与信息科学学院 河北省机器学习与计算智能重点实验室,河北 保定 071002

Abstract: Stochastic gradient descent (SGD) has been applied to large scale support vector machine (SVM) training.  Stochastic gradient descent takes a random way to select points during training process, this leads to a result that the probability of choosing majority class is far greater than that of choosing minority class for imbalanced classification problem. In order to deal with large scale imbalanced data classification problems, this paper proposes a method named weighted stochastic gradient descent algorithm for SVM. After the samples in the majority class are assigned a smaller weight while the samples in the minority class are assigned a larger weight, the weighted stochastic gradient descent algorithm will be used to solving the primal problem of SVM, which helps to reduce the hyperplane offset to the minority class, thus solves the large scale imbalanced data classification problems.

Key words: stochastic gradient descent (SGD), weight, imbalanced data, large scale learning, support vector machine (SVM)

摘要: 随机梯度下降(stochastic gradient descent,SGD)方法已被应用于大规模支持向量机(support vector machine,SVM)训练,其在训练时采取随机选点的方式,对于非均衡分类问题,导致多数类点被抽取到的概率要远远大于少数类点,造成了计算上的不平衡。为了处理大规模非均衡数据分类问题,提出了加权随机梯度下降的SVM在线算法,对于多数类中的样例被赋予较小的权值,而少数类中的样例被赋予较大的权值,然后利用加权随机梯度下降算法对SVM原问题进行求解,减少了超平面向少数类的偏移,较好地解决了大规模学习中非均衡数据的分类问题。

关键词: 随机梯度下降(SGD), 权, 非均衡数据, 大规模学习, 支持向量机(SVM)