Normalizing Flows
A normalizing flow models a complex distribution as the pushforward of a simple base distribution (typically a unit Gaussian) through a sequence of invertible, differentiable transformations. The price of invertibility is paid up front — restricted layer designs — but in exchange the model gives exact log-likelihoods and exact ancestral sampling, neither of which VAEs or GANs deliver.
The change-of-variables formula
If
A flow stacks
The engineering challenge is designing
Coupling layers: NICE and Real NVP
NICE: Non-linear Independent Components Estimation (Dinh, Krueger, Bengio, ICLR 2015 W) introduced coupling layers: split the input into
The Jacobian is triangular with unit diagonal — log-det is exactly zero. Real NVP (Dinh, Sohl-Dickstein, Bengio, ICLR 2017) generalises to affine coupling,
Glow
Glow: Generative Flow with Invertible 1×1 Convolutions (Kingma, Dhariwal, NeurIPS 2018) replaced the fixed permutations with learned invertible 1×1 convolutions, parameterised via LU decomposition for cheap log-determinants. Glow demonstrated photorealistic face generation at 256×256 from a flow — competitive with contemporaneous GANs at the time — and became the canonical "modern flow" reference.
Autoregressive flows
Masked Autoregressive Flow (MAF, Papamakarios et al., NeurIPS 2017) and Inverse Autoregressive Flow (IAF, Kingma et al., NeurIPS 2016) are flows where the transformation is autoregressive:
The Jacobian is triangular; log-det is the sum of
Continuous flows: FFJORD and neural ODEs
Neural Ordinary Differential Equations (Chen et al., NeurIPS 2018) replaces the discrete chain
Continuous flows free the model from coupling-layer architectural restrictions but add ODE-solver compute at training and sampling time. Subsequent flow-matching and rectified-flow approaches (Lipman et al., Liu et al., 2022–23) reformulate the training objective to bypass the integration entirely, and have become competitive with diffusion for image generation.
Why flows lost on natural images
Flows were a serious image-generation contender ~2018 but were overtaken by diffusion for two reasons: the invertibility constraint forces relatively low expressivity per layer (so flows need many layers), and they don't compress to a low-dimensional latent the way VAEs do (so naive flows operate at full pixel resolution). Diffusion bypasses both by giving up exact likelihood.
Flows persist where exact likelihood matters — physics simulation, density estimation, particle physics, simulation-based inference — and as the conceptual ancestor of flow-matching, which has come back to image generation.
What to read next
- Variational Autoencoders — approximate-likelihood alternative.
- Generative Adversarial Networks — likelihood-free alternative.
- Image Generation (CV) — diffusion took over the image task; flow-matching is reclaiming ground.