PCA & SVD
Principal Component Analysis is the canonical linear dimensionality-reduction technique: find the directions of maximum variance in the data and project onto the top
The objective
Given
with
SVD computation
For the centred data matrix
- The columns of
are the principal components (eigenvectors of ). - The squared singular values
are the variances along each component. - The projected data is
— the first columns of .
For
Three views
PCA can be derived from three independent objectives that all agree:
- Maximum variance — pick the projection whose projected variance is largest.
- Minimum reconstruction error — pick the projection whose reconstruction
minimises . - Decorrelation — find an orthonormal basis in which features are linearly uncorrelated.
These coincide because of the Eckart-Young theorem: the best rank-
Whitening
After projecting, you can whiten by dividing each component by its singular value:
The result has unit covariance and is the input format expected by some downstream methods (Fisher LDA, ICA). Whitening is a regularisation choice — it equalises components, removing the natural variance-based weighting.
Choosing
- Variance explained —
such that (or 0.99). Standard but ad-hoc. - Scree plot — plot
vs , look for an "elbow". - Cross-validate — pick
minimising downstream-task error.
Probabilistic PCA
Probabilistic PCA (Tipping, Bishop, 1999) gives a generative model:
The maximum-likelihood
Limitations
- Linear. PCA finds the best linear subspace. Non-linear structure (manifolds, clusters) is missed.
- Variance is not always meaning. High variance might come from noise or scale, not signal. Standardise inputs first if features are on different scales.
- Sensitive to outliers. Squared error is dominated by extreme points; consider robust PCA for noisy data.
For non-linear structure use t-SNE / UMAP or autoencoders. For non-Gaussian latents, use ICA or normalising flows.
What PCA is for, today
- Visualisation of high-dimensional data — project to 2D / 3D for inspection.
- Compression with reconstruction guarantees — top-
SVD is the optimal rank- approximation. - Pre-processing for downstream methods sensitive to dimensionality (kNN, GMM, clustering).
- Feature analysis — examining principal components reveals dominant axes of variation.
- Inside larger systems — covariance estimators in finance, denoising, image compression (DCT-style).
In the deep-learning era, PCA's role has shifted from feature extractor to diagnostic tool — it tells you whether low-dimensional structure exists before you commit to a more expressive model.
What to read next
- Manifold Learning — non-linear dimensionality reduction.
- Linear Algebra Recap — the SVD that powers PCA.
- LDA / QDA — the supervised analogue.