Convolutional Neural Networks & Deep Learning Architectures
Master convolutions from first principles—dimensions, receptive fields, pooling—and the regularization that makes CNNs train well. Then survey the landmark architectures that shaped modern vision models.
What you’ll learn
- Convolutions, precisely: kernels, padding/stride/dilation, output shapes, parameter counts, and receptive fields.
- Pooling & downsampling: max/avg pooling, strided convs, when to use which.
- Invariance & equivariance: why convolutions generalize spatially and when they fail.
- Regularization that works for vision: data augmentation (crop/flip/color jitter, CutMix/MixUp), dropout2d, weight decay, label smoothing, stochastic depth.
- Normalization choices: BatchNorm vs. GroupNorm; where to place them and effects on training.
- Architectural patterns: from LeNet & AlexNet to VGG, ResNet (skip connections), Inception/Xception (multi-branch & depthwise separable), FCNs and U-Net for dense prediction.
- Transfer learning: when to freeze, partial fine-tuning, discriminative learning rates, and resizing tricks for higher resolution.
- Evaluation & diagnostics: top-1/top-5 accuracy, confusion matrices, mIoU/Dice for segmentation; spotting over/underfitting and data leakage.
Hands-on application
- Build a small CNN from scratch (PyTorch/NumPy) and verify output shapes & receptive fields at each layer.
- Train ResNet-18 on CIFAR-10; compare SGD+Momentum vs. AdamW, add Cosine LR + warmup, and measure accuracy/time.
- Fine-tune a pretrained backbone (e.g., ResNet/MobileNet) on your custom dataset with transfer learning best practices.
- Implement U-Net (or load a reference) and train on a segmentation dataset (e.g., Oxford-IIIT Pets masks); report mIoU/Dice.
- Experiment ledger: toggle BatchNorm/GroupNorm, data aug recipes, and regularizers; log results and plot learning curves.
Prerequisites
- Comfortable with forward/backprop and basic neural nets (MLP).
- Python and PyTorch experience; tensors, autograd, and training loops.
- Basic linear algebra and calculus (convolutions, chain rule).
Who it’s for
- Developers with a background in ML who want to understand and apply CNNs and modern architectures.
Format
- Duration: ~6–8 hours, fully self-paced.
- Structure: 6 modules → concise lesson, worked example, coding notebook.