Linear & Quadratic Discriminant Analysis
LDA and QDA are generative classifiers: model each class's feature distribution as a multivariate Gaussian, then classify by Bayes' rule. They sit between logistic regression (also linear, but discriminative) and Naive Bayes (also generative, but with a stronger independence assumption). LDA in particular is a workhorse baseline that doubles as a dimensionality reduction technique.
The model
For
By Bayes' rule, the log-posterior is
The classifier picks the class with the highest log-posterior. The shape of the decision boundary depends on what we assume about
QDA — class-specific covariance
If each class has its own covariance
QDA needs
LDA — shared covariance
If all classes share a single covariance
LDA estimates a single pooled covariance from all classes:
This is exactly Gaussian Naive Bayes with non-diagonal covariance — the difference from Naive Bayes is that LDA captures correlations between features.
LDA as dimensionality reduction
Beyond classification, LDA gives a supervised projection. The decision rules depend only on
with
This is the supervised counterpart to PCA — PCA finds high-variance directions; LDA finds high class-separation directions. For visualisation of a labelled dataset in 2D, LDA-projected scatter plots are often more informative than PCA.
Regularisation: shrinkage and RDA
When
- Shrinkage — replace
with for some , blending toward the diagonal-only (Naive Bayes) covariance. - Regularised Discriminant Analysis (Friedman, 1989) — interpolate between LDA and QDA:
, with chosen by cross-validation.
When LDA / QDA win
- Modest dimension, Gaussian-ish features — LDA matches or beats logistic regression, often with much faster training.
- Multi-class problems with equal effort across classes — LDA naturally handles all
at once. - As a dimensionality-reduction step before another classifier — Fisher LDA gives strong class-separation directions almost for free.
The Gaussian assumption is rarely exactly true, but LDA is robust to mild deviations. For categorical or extremely skewed data, generalised additive models or tree-based methods are better.
What to read next
- Naive Bayes — same Gaussian likelihood, but with diagonal covariance.
- Logistic Regression — the discriminative analogue with the same linear boundary.
- PCA & SVD — the unsupervised analogue of Fisher LDA.