Ordinary Least Squares (OLS)
OLS is the first model worth deeply understanding. It is the simplest predictor that makes a non-trivial probabilistic assumption, and almost every modern technique — ridge, logistic regression, neural network linear layers, even the final projection of an LLM — reduces to OLS in some limit.
Setup
Given a design matrix
Closed form
Setting the gradient to zero gives the normal equations
In practice we never form the inverse — we solve the linear system via QR or SVD for numerical stability.
Geometric view
Probabilistic view
If
What to read next
- Ridge & Lasso Regression — what to do when
is singular or ill-conditioned. - Logistic Regression — replace Gaussian noise with Bernoulli; everything else is the same story.
- Generalized Linear Models — the unifying framework.
Stub status
This page has a seed introduction. Expand sections on Gauss–Markov, leverage, influence, and the bias–variance decomposition.