The Transformer Era
A chronological walk through the post-2017 era. Each year's sub-section is small enough to read in one sitting; together they form the genealogy of modern frontier models.
Timeline at a glance
| Year(s) | Theme | Highlights |
|---|---|---|
| 2017 | The attention revolution | "Attention Is All You Need" |
| 2018–2019 | The pretraining era | GPT-1/2, BERT, T5 |
| 2020 | Scaling & few-shot | GPT-3, scaling laws, efficient attention |
| 2020–2021 | Multimodal foundations | ViT, CLIP, DALL·E, Codex |
| 2021–2022 | Diffusion renaissance | DDPM, Stable Diffusion, DALL·E 2, Imagen |
| 2022 | Instruction tuning & RLHF | InstructGPT, ChatGPT, PaLM, Chinchilla |
| 2023 | Open LLM wave | GPT-4, LLaMA 1/2, Mistral, Claude, Constitutional AI |
| 2023–2024 | Tool use, RAG, agents | RAG, ReAct, Toolformer, agent frameworks |
| 2023–2024 | Long context & MoE | RoPE/ALiBi, Mixtral, 1M+ contexts |
| 2023–2024 | Alternatives to attention | Mamba, RWKV, Hyena, RetNet |
| 2024 | Omni & multimodal frontier | GPT-4o, Claude 3, Gemini 1.5, LLaMA 3, Sora |
| 2024–2025 | Reasoning models | o1, DeepSeek-R1, process rewards, deep research |
| 2024–2025 | Video & 3D generation | Sora, Veo, Wan, advanced 3DGS |
| 2025–2026 | Agentic systems | Computer use, coding agents, frontier alignment |
How to read
- Each year's section is stand-alone but assumes the prior year. If you're new to the area, start at 2017.
- Heavy multimodal-vision papers also live under Computer Vision · Advances (CSE5519) — the imports cross-reference both.
- For broader fundamentals, see the Fundamentals & History and Deep Neural Networks tracks.