Skip to content

The Transformer Era

A chronological walk through the post-2017 era. Each year's sub-section is small enough to read in one sitting; together they form the genealogy of modern frontier models.

Timeline at a glance

Year(s)ThemeHighlights
2017The attention revolution"Attention Is All You Need"
2018–2019The pretraining eraGPT-1/2, BERT, T5
2020Scaling & few-shotGPT-3, scaling laws, efficient attention
2020–2021Multimodal foundationsViT, CLIP, DALL·E, Codex
2021–2022Diffusion renaissanceDDPM, Stable Diffusion, DALL·E 2, Imagen
2022Instruction tuning & RLHFInstructGPT, ChatGPT, PaLM, Chinchilla
2023Open LLM waveGPT-4, LLaMA 1/2, Mistral, Claude, Constitutional AI
2023–2024Tool use, RAG, agentsRAG, ReAct, Toolformer, agent frameworks
2023–2024Long context & MoERoPE/ALiBi, Mixtral, 1M+ contexts
2023–2024Alternatives to attentionMamba, RWKV, Hyena, RetNet
2024Omni & multimodal frontierGPT-4o, Claude 3, Gemini 1.5, LLaMA 3, Sora
2024–2025Reasoning modelso1, DeepSeek-R1, process rewards, deep research
2024–2025Video & 3D generationSora, Veo, Wan, advanced 3DGS
2025–2026Agentic systemsComputer use, coding agents, frontier alignment

How to read

Released under the MIT License. Content imported and adapted from NoteNextra.