Sequences, Transformers & Generative Models
Model time-dependent data with RNNs/LSTMs, then learn attention and Transformers for long-range patterns. Finish with a friendly tour of graph neural nets and modern generative models.
What you’ll learn
- RNNs, LSTMs, GRUs: how they handle sequences and why gradients can vanish/explode.
- Embeddings & attention: turn tokens into vectors and focus on what matters.
- Transformers: multi-head attention, encoder/decoder basics, and when to use them.
- Generative models: what VAEs, GANs, and diffusion models do at a high level.
- Transfer & RL basics: reuse pretrained models; understand policies, rewards, and when RL fits.
- Evaluation: simple metrics you’ll actually use (perplexity, BLEU/ROUGE, FID), plus how to read learning curves.
Hands-on application
- Train a tiny character LSTM and sample text at different temperatures.
- Build a mini Transformer and run it on a small text task.
- Fine-tune a pretrained model (e.g., DistilBERT) with Hugging Face.
- Train a VAE or GAN on MNIST and generate samples.
- Run diffusion inference with a lightweight pretrained pipeline.
- (Optional) Try a small graph network (Cora dataset) or a CartPole RL agent.
Prerequisites
- Comfortable training simple neural nets in PyTorch.
- Basic linear algebra, probability, and cross-entropy/MSE.
- Some NLP familiarity (tokens, vocab) is helpful but not required.
Who it’s for
- Developers who can train basic models and want a clear path into generative modeling.
- Learners who prefer code-first explanations with just enough math.
Format
- Duration: ~8–10 hours, self-paced.
- Structure: 6 modules → short lesson, worked example, coding notebook.