Authors
Srivastava et al.
Conference
JMLR 2014
Abstract
Dropout randomly drops units during training, preventing co-adaptation and reducing overfitting.
Method
During training, randomly set activations to zero with probability p (typically 0.5). At test time, multiply activations by (1 - p) to account for the additional units.
Why It Works
- Forces network to learn redundant representations
- Approximates ensemble of exponentially many networks
- Acts as strong regularizer
Essential technique for training deep networks on small datasets.