Sequences, Transformers & Generative Models

Model time-dependent data with RNNs/LSTMs, then learn attention and Transformers for long-range patterns. Finish with a friendly tour of graph neural nets and modern generative models.

What you’ll learn

  • RNNs, LSTMs, GRUs: how they handle sequences and why gradients can vanish/explode.
  • Embeddings & attention: turn tokens into vectors and focus on what matters.
  • Transformers: multi-head attention, encoder/decoder basics, and when to use them.
  • Generative models: what VAEs, GANs, and diffusion models do at a high level.
  • Transfer & RL basics: reuse pretrained models; understand policies, rewards, and when RL fits.
  • Evaluation: simple metrics you’ll actually use (perplexity, BLEU/ROUGE, FID), plus how to read learning curves.

Hands-on application

  • Train a tiny character LSTM and sample text at different temperatures.
  • Build a mini Transformer and run it on a small text task.
  • Fine-tune a pretrained model (e.g., DistilBERT) with Hugging Face.
  • Train a VAE or GAN on MNIST and generate samples.
  • Run diffusion inference with a lightweight pretrained pipeline.
  • (Optional) Try a small graph network (Cora dataset) or a CartPole RL agent.

Prerequisites

  • Comfortable training simple neural nets in PyTorch.
  • Basic linear algebra, probability, and cross-entropy/MSE.
  • Some NLP familiarity (tokens, vocab) is helpful but not required.

Who it’s for

  • Developers who can train basic models and want a clear path into generative modeling.
  • Learners who prefer code-first explanations with just enough math.

Format

  • Duration: ~8–10 hours, self-paced.
  • Structure: 6 modules → short lesson, worked example, coding notebook.