Back to Interests
ICLR 2015

Adam: A Method for Stochastic Optimization .

Optimization Deep Learning

Authors

Kingma, Ba

Conference

ICLR 2015

Abstract

Adam (Adaptive Moment Estimation) combines the advantages of AdaGrad and RMSProp. It computes adaptive learning rates for each parameter.

Algorithm

  • Maintains exponentially decaying averages of past gradients (momentum)
  • Also maintains exponentially decaying averages of past squared gradients (adaptive learning rate)

Impact

Adam became the default optimizer for deep learning. While SGD with momentum is still preferred for some vision tasks, Adam is ubiquitous in NLP and general deep learning.