The Transformer Era

A chronological walk through the post-2017 era. Each year's sub-section is small enough to read in one sitting; together they form the genealogy of modern frontier models.

Timeline at a glance

Year(s)	Theme	Highlights
2017	The attention revolution	"Attention Is All You Need"
2018–2019	The pretraining era	GPT-1/2, BERT, T5
2020	Scaling & few-shot	GPT-3, scaling laws, efficient attention
2020–2021	Multimodal foundations	ViT, CLIP, DALL·E, Codex
2021–2022	Diffusion renaissance	DDPM, Stable Diffusion, DALL·E 2, Imagen
2022	Instruction tuning & RLHF	InstructGPT, ChatGPT, PaLM, Chinchilla
2023	Open LLM wave	GPT-4, LLaMA 1/2, Mistral, Claude, Constitutional AI
2023–2024	Tool use, RAG, agents	RAG, ReAct, Toolformer, agent frameworks
2023–2024	Long context & MoE	RoPE/ALiBi, Mixtral, 1M+ contexts
2023–2024	Alternatives to attention	Mamba, RWKV, Hyena, RetNet
2024	Omni & multimodal frontier	GPT-4o, Claude 3, Gemini 1.5, LLaMA 3, Sora
2024–2025	Reasoning models	o1, DeepSeek-R1, process rewards, deep research
2024–2025	Video & 3D generation	Sora, Veo, Wan, advanced 3DGS
2025–2026	Agentic systems	Computer use, coding agents, frontier alignment

How to read

Each year's section is stand-alone but assumes the prior year. If you're new to the area, start at 2017.
Heavy multimodal-vision papers also live under Computer Vision · Advances (CSE5519) — the imports cross-reference both.
For broader fundamentals, see the Fundamentals & History and Deep Neural Networks tracks.