Generative Adversarial Networks
A GAN trains two networks against each other: a generator
The original objective
Generative Adversarial Nets (Goodfellow et al., NeurIPS 2014) defines the game:
GANs do not provide a likelihood — they are likelihood-free generators. This is both their strength (no Gaussian-blur penalty like VAEs) and their weakness (no quantitative comparison across models).
Training pathologies
The min-max game has two recurring failure modes:
- Mode collapse —
produces only a small subset of the data distribution, because doing so still fools . The generator finds a corner of the data and refuses to leave. - Training instability — gradients from
vanish when it gets too good (saturation), or explode when it gets too bad. The standard recipes — alternating updates, label smoothing, gradient penalty — are workarounds.
DCGAN
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (Radford, Metz, Chintala, ICLR 2016) gave the first stable convolutional GAN recipe: strided convs (no pooling), batch norm, ReLU in
WGAN — Wasserstein objective
Wasserstein GAN (Arjovsky, Chintala, Bottou, ICML 2017) replaces the JS divergence with the Wasserstein-1 distance:
The "discriminator" becomes a 1-Lipschitz critic
StyleGAN and the photorealism plateau
Progressive Growing of GANs (Karras et al., ICLR 2018), StyleGAN (Karras, Laine, Aila, CVPR 2019), StyleGAN2 (CVPR 2020), and StyleGAN3 (NeurIPS 2021) drove face-image GANs to indistinguishable-from-real photorealism at 1024×1024. Architectural innovations: a mapping network that disentangles the latent space, adaptive instance normalisation (AdaIN) to inject style at each resolution, and (in StyleGAN3) explicit attention to alias-free equivariant generation. StyleGAN remained the SOTA face generator for years.
Conditional GANs and image-to-image
Conditional GAN (Mirza, Osindero, 2014) adds a label
Why diffusion took over
GANs lost to diffusion on natural-image generation around 2022 for three reasons: diffusion has a stable, single-objective training (no minimax), produces higher diversity at the same FID, and scales more straightforwardly with compute. GANs are still the right choice for very-fast inference (one forward pass vs hundreds of denoising steps), specialised face-generation tasks, and as a discriminator inside hybrid systems. The adversarial idea persists everywhere preference modelling does — RLHF reward models, learned losses, perceptual quality estimators.
What to read next
- Variational Autoencoders — the likelihood-based contemporary.
- Image Generation (CV) — where diffusion took over.
- Normalizing Flows — exact-likelihood generative models.