Train Neural Networks

Make training faster, more stable, and more reliable with practical techniques you can apply immediately.

What you’ll learn

Use mini-batches effectively: choose batch size, shuffle/sampler strategies, and gradient accumulation.
Pick and tune optimizers: SGD + momentum/Nesterov vs. Adam/AdamW vs. RMSProp—when each shines.
Get weight initialization right: Xavier/He, fan-in/out, and activation-aware scaling.
Apply normalization: BatchNorm vs. LayerNorm (where to place them, effects on gradient flow and batch size).
Add regularization that works: weight decay, dropout, early stopping, and label smoothing.
Improve stability & speed: gradient clipping, mixed precision (fp16/bfloat16) with loss scaling, reproducible seeds.
Read learning curves to diagnose under/overfitting, vanishing/exploding gradients, and data/label issues.

Hands-on application

Build a robust PyTorch training loop with mini-batching, LR schedulers, and optimizer switching.
Compare SGD+Momentum vs. AdamW on a benchmark task; track accuracy/loss and time/epoch.
Toggle BatchNorm/LayerNorm and initializations to see effects on convergence.
Enable mixed precision (autocast + GradScaler) and gradient clipping; measure speedups and stability.

Prerequisites

Who it’s for

Format