Sequences, Transformers & Generative Models

Model time-dependent data with RNNs/LSTMs, then learn attention and Transformers for long-range patterns. Finish with a friendly tour of graph neural nets and modern generative models.

What you’ll learn

RNNs, LSTMs, GRUs: how they handle sequences and why gradients can vanish/explode.
Embeddings & attention: turn tokens into vectors and focus on what matters.
Transformers: multi-head attention, encoder/decoder basics, and when to use them.
Generative models: what VAEs, GANs, and diffusion models do at a high level.
Transfer & RL basics: reuse pretrained models; understand policies, rewards, and when RL fits.
Evaluation: simple metrics you’ll actually use (perplexity, BLEU/ROUGE, FID), plus how to read learning curves.

Hands-on application

Train a tiny character LSTM and sample text at different temperatures.
Build a mini Transformer and run it on a small text task.
Fine-tune a pretrained model (e.g., DistilBERT) with Hugging Face.
Train a VAE or GAN on MNIST and generate samples.
Run diffusion inference with a lightweight pretrained pipeline.
(Optional) Try a small graph network (Cora dataset) or a CartPole RL agent.

Prerequisites

Comfortable training simple neural nets in PyTorch.
Basic linear algebra, probability, and cross-entropy/MSE.
Some NLP familiarity (tokens, vocab) is helpful but not required.

Who it’s for

Developers who can train basic models and want a clear path into generative modeling.
Learners who prefer code-first explanations with just enough math.

Format

Duration: ~8–10 hours, self-paced.
Structure: 6 modules → short lesson, worked example, coding notebook.