Authors
He et al.
Conference
CVPR 2016
Problem
Deeper neural networks are harder to train. Degradation: as networks get deeper, accuracy saturates and then degrades (not due to overfitting).
Solution: Residual Learning
Instead of learning , learn the residual .
Traditional: H(x)
ResNet: F(x) = H(x) - x, so output = F(x) + x
This is implemented via skip connections (identity mappings).
def residual_block(x):
residual = x
x = conv(x)
x = relu(x)
x = conv(x)
x = x + residual # Skip connection
x = relu(x)
return x
Why It Works
- If the optimal mapping is close to identity, it's easier to learn than to learn from scratch.
- Gradients flow directly through skip connections, mitigating vanishing gradients.
Impact
ResNet won ImageNet 2015 with 152 layers. Without residual connections, such depth was impossible to train.