Train Neural Networks
Make training faster, more stable, and more reliable with practical techniques you can apply immediately.
What you’ll learn
- Use mini-batches effectively: choose batch size, shuffle/sampler strategies, and gradient accumulation.
- Pick and tune optimizers: SGD + momentum/Nesterov vs. Adam/AdamW vs. RMSProp—when each shines.
- Get weight initialization right: Xavier/He, fan-in/out, and activation-aware scaling.
- Apply normalization: BatchNorm vs. LayerNorm (where to place them, effects on gradient flow and batch size).
- Add regularization that works: weight decay, dropout, early stopping, and label smoothing.
- Improve stability & speed: gradient clipping, mixed precision (fp16/bfloat16) with loss scaling, reproducible seeds.
- Read learning curves to diagnose under/overfitting, vanishing/exploding gradients, and data/label issues.
Hands-on application
- Build a robust PyTorch training loop with mini-batching, LR schedulers, and optimizer switching.
- Compare SGD+Momentum vs. AdamW on a benchmark task; track accuracy/loss and time/epoch.
- Toggle BatchNorm/LayerNorm and initializations to see effects on convergence.
- Enable mixed precision (autocast + GradScaler) and gradient clipping; measure speedups and stability.
Prerequisites
- Comfort with forward/backprop and a simple MLP (from Neural Networks 101).
- Python experience (NumPy, PyTorch) and basic calculus.
Who it’s for
- Developers who can build a small network and want to train it well at scale.
- Practitioners seeking repeatable, stable training recipes.
Format
- Duration: ~6–8 hours, fully self-paced.
- Structure: 6 modules → brief lesson, recipe checklist, coding notebook.