Variance Reduction on General Adaptive Stochastic Mirror Descent

Published in Machine Learning (Previously at OPT@NeurIPS'20), 2022

Recommended citation: Wenjie Li, Zhanyu Wang, Yichen Zhang, Guang Cheng. Variance Reduction on General Adaptive Stochastic Mirror Descent. Variance reduction on general adaptive stochastic mirror descent. Machine Learning (2022).

[MLJ paper] [OPT poster] [OPT paper] [slides] [arXiv]

In this work, we study the idea of variance reduction applied to adaptive stochastic mirror descent algorithms in the nonsmooth nonconvex finite-sum optimization problems. We propose a simple yet generalized adaptive mirror descent algorithm with variance reduction named SVRAMD and provide its convergence analysis in different settings. We prove that variance reduction reduces the SFO complexity of most adaptive mirror descent algorithms and accelerates their convergence. In particular, our general theory implies that variance reduction can be applied to algorithms using time-varying step sizes and self-adaptive algorithms such as AdaGrad and RMSProp. Moreover, the convergence rates of SVRAMD recover the best existing rates of non-adaptive variance reduced mirror descent algorithms. We check the validity of our claims using experiments in deep learning.