计算机科学与探索

• 学术研究 •    下一篇

基于随机采样的方差缩减优化算法

郭振华, 闫瑞栋, 邱志勇, 赵雅倩, 李仁刚   

  1. 山东海量信息技术研究院, 济南 250101

A Variance Reduction Optimization Algorithm based on Random Sampling

GUO Zhenhua, YAN Ruidong, QIU Zhiyong, ZHAO Yaqian, LI Rengang   

  1. Shandong Massive Information Technology Research Institute, Jinan 250101, China

摘要: 随机梯度下降(stochastic gradient descent, SGD)算法因其性能优异而引起了机器学习和深度学习等领域研究人员的广泛关注。然而,SGD使用单样本随机梯度近似样本全梯度导致算法在迭代过程中引入了额外的方差,使得算法的收敛曲线震荡甚至发散,导致其收敛速率缓慢。因此,有效减小方差成为当前关键挑战。为此,提出了一种基于小批量随机采样的方差缩减优化算法(double mini-batch stochastic recursive gradient,DM-SRG),并应用于求解凸优化及非凸优化问题。算法主要特征在于设计了内外双循环结构:外循环结构采用小批量随机样本计算梯度近似全梯度,以达到减少梯度计算开销的目的;内循环结构采用小批量随机样本计算梯度并代替单样本随机梯度,提升算法收敛稳定性。针对非凸目标函数与凸目标函数,理论分析证明了DM-SRG算法具有次线性收敛速率。此外,设计了基于计算单元性能评估模型的动态样本容量调整策略,以提高系统训练效率。为评估算法的有效性,分别在不同规模的真实数据集上开展了数值模拟实验。实验结果表明算法较对比算法损失函数减少18.1%并且平均耗时降低8.22%。

关键词: 随机梯度下降, 方差缩减, 凸优化, 非凸优化, 收敛速率

Abstract: The stochastic gradient descent (SGD) algorithms have been applied to machine learning and deep learning due to their superior performance. However, SGD requires the stochastic gradient of a single sample to approximate the full gradient of all samples, introducing additional variance in each iteration. This makes the convergence curve of SGD oscillate or even diverge. Therefore, effectively reducing variance becomes a key challenge at present. To address the above challenge, a variance reduction optimization algorithm, DM-SRG (double mini-batch stochastic recursive gradient), based on mini-batch random sampling is proposed and applied to solving convex and non-convex optimization problems. The main feature of the algorithm including an inner and outer double loop structure is designed: the outer loop structure uses a mini-batch random samples to calculate the gradient, approximating the full gradient and reducing the gradient calculation cost; the inter loop structure also uses a mini-batch random samples to calculate the gradient, and replacing the single sample random gradient and improving convergence stability of the algorithm. In this paper, a sublinear convergence rate of DM-SRG algorithm is theoretically guaranteed for both non-convex and convex objective functions. The effectiveness of the algorithm is evaluated via numerical simulation experiments on real datasets of varying sizes. The experimental results show that the loss function of the DM-SRG algorithm is reduced by 18.1%, and the average time of the algorithm is reduced by 8.22%.

Key words: stochastic gradient descent, variance reduction, convex optimization, non-convex optimization, convergence rate