Convolutional Neural Networks & Deep Learning Architectures

Master convolutions from first principles—dimensions, receptive fields, pooling—and the regularization that makes CNNs train well. Then survey the landmark architectures that shaped modern vision models.

What you’ll learn

Convolutions, precisely: kernels, padding/stride/dilation, output shapes, parameter counts, and receptive fields.
Pooling & downsampling: max/avg pooling, strided convs, when to use which.
Invariance & equivariance: why convolutions generalize spatially and when they fail.
Regularization that works for vision: data augmentation (crop/flip/color jitter, CutMix/MixUp), dropout2d, weight decay, label smoothing, stochastic depth.
Normalization choices: BatchNorm vs. GroupNorm; where to place them and effects on training.
Architectural patterns: from LeNet & AlexNet to VGG, ResNet (skip connections), Inception/Xception (multi-branch & depthwise separable), FCNs and U-Net for dense prediction.
Transfer learning: when to freeze, partial fine-tuning, discriminative learning rates, and resizing tricks for higher resolution.
Evaluation & diagnostics: top-1/top-5 accuracy, confusion matrices, mIoU/Dice for segmentation; spotting over/underfitting and data leakage.

Hands-on application

Build a small CNN from scratch (PyTorch/NumPy) and verify output shapes & receptive fields at each layer.
Train ResNet-18 on CIFAR-10; compare SGD+Momentum vs. AdamW, add Cosine LR + warmup, and measure accuracy/time.
Fine-tune a pretrained backbone (e.g., ResNet/MobileNet) on your custom dataset with transfer learning best practices.
Implement U-Net (or load a reference) and train on a segmentation dataset (e.g., Oxford-IIIT Pets masks); report mIoU/Dice.
Experiment ledger: toggle BatchNorm/GroupNorm, data aug recipes, and regularizers; log results and plot learning curves.

Prerequisites

Comfortable with forward/backprop and basic neural nets (MLP).
Python and PyTorch experience; tensors, autograd, and training loops.
Basic linear algebra and calculus (convolutions, chain rule).

Who it’s for

Developers with a background in ML who want to understand and apply CNNs and modern architectures.

Format

Duration: ~6–8 hours, fully self-paced.
Structure: 6 modules → concise lesson, worked example, coding notebook.