Back to Interests
CVPR 2016

Deep Residual Learning for Image Recognition .

Computer Vision CNN

Authors

He et al.

Conference

CVPR 2016

Problem

Deeper neural networks are harder to train. Degradation: as networks get deeper, accuracy saturates and then degrades (not due to overfitting).

Solution: Residual Learning

Instead of learning H(x)H(x), learn the residual F(x)=H(x)xF(x) = H(x) - x.

Traditional: H(x)
ResNet:      F(x) = H(x) - x, so output = F(x) + x

This is implemented via skip connections (identity mappings).

def residual_block(x):
    residual = x
    x = conv(x)
    x = relu(x)
    x = conv(x)
    x = x + residual  # Skip connection
    x = relu(x)
    return x

Why It Works

  • If the optimal mapping is close to identity, it's easier to learn F(x)=0F(x) = 0 than to learn H(x)=xH(x) = x from scratch.
  • Gradients flow directly through skip connections, mitigating vanishing gradients.

Impact

ResNet won ImageNet 2015 with 152 layers. Without residual connections, such depth was impossible to train.