Authors
Ioffe, Szegedy
Conference
ICML 2015
Problem
Internal covariate shift: distribution of layer inputs changes during training, slowing convergence.
Solution
Normalize layer inputs to have zero mean and unit variance within each mini-batch.
Impact
- Allows higher learning rates
- Acts as regularizer (reduces need for dropout)
- Enables training of much deeper networks
Became standard component in most architectures.