Train Neural Networks

Make training faster, more stable, and more reliable with practical techniques you can apply immediately.

What you’ll learn

  • Use mini-batches effectively: choose batch size, shuffle/sampler strategies, and gradient accumulation.
  • Pick and tune optimizers: SGD + momentum/Nesterov vs. Adam/AdamW vs. RMSProp—when each shines.
  • Get weight initialization right: Xavier/He, fan-in/out, and activation-aware scaling.
  • Apply normalization: BatchNorm vs. LayerNorm (where to place them, effects on gradient flow and batch size).
  • Add regularization that works: weight decay, dropout, early stopping, and label smoothing.
  • Improve stability & speed: gradient clipping, mixed precision (fp16/bfloat16) with loss scaling, reproducible seeds.
  • Read learning curves to diagnose under/overfitting, vanishing/exploding gradients, and data/label issues.

Hands-on application

  • Build a robust PyTorch training loop with mini-batching, LR schedulers, and optimizer switching.
  • Compare SGD+Momentum vs. AdamW on a benchmark task; track accuracy/loss and time/epoch.
  • Toggle BatchNorm/LayerNorm and initializations to see effects on convergence.
  • Enable mixed precision (autocast + GradScaler) and gradient clipping; measure speedups and stability.

Prerequisites

  • Comfort with forward/backprop and a simple MLP (from Neural Networks 101).
  • Python experience (NumPy, PyTorch) and basic calculus.

Who it’s for

  • Developers who can build a small network and want to train it well at scale.
  • Practitioners seeking repeatable, stable training recipes.

Format

  • Duration: ~6–8 hours, fully self-paced.
  • Structure: 6 modules → brief lesson, recipe checklist, coding notebook.