Markov Chains
A Markov chain is a stochastic process where the next state depends only on the current state, not on the history that led to it. The Markov property is the structural assumption underneath HMMs, MDPs, MCMC, n-gram language models, and most of probabilistic time-series modelling. Understanding the chain's stationary behaviour and convergence rate is the prerequisite for the rest.
The Markov property
A discrete-time stochastic process
For finite
Probability evolution is matrix multiplication: if
Classifications
States have structure:
- Recurrent — the chain returns to it with probability 1.
- Transient — only visited finitely many times almost surely.
- Absorbing — once entered, never left (
). - Periodic — return times share a common divisor
.
A chain is irreducible if every state can reach every other; aperiodic if no period
Stationary distribution
The stationary
A useful sufficient condition: if there exists
Mixing time and the spectral gap
How fast does
where
The spectral gap is what determines whether MCMC inference is fast or slow. Multimodal target distributions typically produce small gaps and slow mixing; this is one of the central practical problems with MCMC.
MCMC: building chains for inference
Markov Chain Monte Carlo runs the logic in reverse: given a target distribution
- Metropolis-Hastings — propose moves from a proposal distribution
, accept with probability . Detailed balance gives stationarity. - Gibbs sampling — sample each variable from its conditional given the others. Special case of MH with proposals that are always accepted.
- Hamiltonian MC — use gradient information to take long, low-rejection moves; fast mixing for continuous high-dim targets.
MCMC is the workhorse of Bayesian inference, posterior sampling, and any setting where you can evaluate
Where Markov chains appear in ML
- Reinforcement learning — MDPs are Markov chains with actions and rewards.
- Hidden Markov Models — see HMM.
- n-gram language models — text as an order-
Markov chain over tokens. - MCMC inference — Bayesian posterior sampling.
- PageRank — Google's original ranking is the stationary distribution of a Markov chain on the web graph.
- Chain-of-thought reasoning at inference — autoregressive LM generation is a Markov chain over token sequences (the state being the prefix).
What to read next
- Hidden Markov Models — Markov chains with partial observation.
- MDPs & Bellman Equations — Markov chains with actions and rewards.
- Bayesian Networks — directed graphical models that subsume Markov chains.