Authors
Simonyan, Zisserman
Conference
ICLR 2015
Abstract
VGGNet demonstrated that depth matters. By using small 3x3 filters but stacking them deep (16-19 layers), they achieved state-of-the-art performance on ImageNet.
Key Insight
Stack of three 3x3 conv layers has same receptive field as one 7x7 layer, but:
- More non-linearities (deeper)
- Fewer parameters
Legacy
VGG-16 and VGG-19 became the go-to feature extractors for transfer learning before ResNet.