Hidden Markov Models
A Hidden Markov Model is a Markov chain whose states cannot be observed directly — instead, you see noisy observations conditioned on the state. HMMs are the canonical model for sequence data with discrete latent dynamics: speech recognition (1980s–2010s), part-of-speech tagging, gene-finding, and time-series segmentation. They illustrate the three classical inference problems — filtering, smoothing, and learning — in their cleanest form.
The model
An HMM has three components:
- Transition probabilities
over hidden states. - Emission probabilities
. Common choices: categorical (discrete observations), Gaussian (continuous), or mixture of Gaussians. - Initial distribution
.
The joint over hidden states
The conditional independence structure:
Three classical problems
Rabiner's A Tutorial on Hidden Markov Models (1989) is the canonical reference. Three computations cover all the use cases:
- Likelihood / Filtering — given parameters
and observations , compute and . Forward algorithm. - Decoding — find the most likely hidden sequence:
. Viterbi algorithm. - Learning — estimate
from observations alone. Baum-Welch (EM).
Each is
Forward algorithm
Define
Initialise
Viterbi algorithm
Replace sum with max:
Track back-pointers and recover the optimal sequence by following them from
Baum-Welch — EM for HMMs
When
The M-step updates parameters by averaging:
Each iteration monotonically increases the data likelihood. As with any EM, multiple restarts and good initialisation (e.g., k-means on Gaussian emissions) matter.
What HMMs were used for
- Speech recognition — phoneme HMMs were the foundation of every commercial ASR system from the 1980s through the 2010s, before end-to-end neural models took over (DeepSpeech, RNN-T, Whisper).
- Part-of-speech tagging — Viterbi over a small POS state space, replaced by neural taggers around 2015.
- Bioinformatics — gene finding, protein-family alignment (HMMER), CpG island detection. Still in active use.
- Time-series segmentation — change-point detection, regime modelling in finance.
Why HMMs lost to RNNs and Transformers
HMMs assume:
- Discrete latent state of fixed size
. Modelling rich context requires exponential . - Independence of
from history given . Real sequences have long-range dependencies HMMs cannot capture.
RNNs replaced the discrete latent with a continuous hidden state of arbitrary expressivity; Transformers further removed the Markov assumption entirely. HMMs persist in low-data, interpretable, or tightly structured domains; the inference machinery (forward-backward, Viterbi) generalises to neural CRFs and structured-output networks.
What to read next
- Markov Chains — the underlying state dynamics.
- Conditional Random Fields — the discriminative descendant.
- Vanilla RNNs — the neural successor.