Mukund's Portfolio | Deep Residual Learning for Image Recognition

Authors

He et al.

Conference

CVPR 2016

Problem

Deeper neural networks are harder to train. Degradation: as networks get deeper, accuracy saturates and then degrades (not due to overfitting).

Solution: Residual Learning

Instead of learning $H(x)$ , learn the residual $F(x) = H(x) - x$ .

Traditional: H(x)
ResNet:      F(x) = H(x) - x, so output = F(x) + x

This is implemented via skip connections (identity mappings).

def residual_block(x):
    residual = x
    x = conv(x)
    x = relu(x)
    x = conv(x)
    x = x + residual  # Skip connection
    x = relu(x)
    return x

Why It Works

If the optimal mapping is close to identity, it's easier to learn $F(x) = 0$ than to learn $H(x) = x$ from scratch.
Gradients flow directly through skip connections, mitigating vanishing gradients.

Impact

ResNet won ImageNet 2015 with 152 layers. Without residual connections, such depth was impossible to train.