Generalized Linear Models
The Generalised Linear Model (GLM) is the unified framework that contains OLS, logistic regression, Poisson regression, and several others as special cases. Three ingredients — a distribution for the response, a linear predictor, and a link function — fit a wide variety of regression problems with one set of estimation tools (IRLS) and one set of theoretical guarantees.
The three components
Random component. The response
This includes Gaussian, Bernoulli, binomial, Poisson, gamma, and inverse Gaussian distributions. The function
Systematic component. A linear predictor
Link function. A monotone, differentiable function
The canonical link is the one that makes
Common GLMs in one table
| Response | Distribution | Canonical link | Use case |
|---|---|---|---|
| Continuous, unbounded | Gaussian | identity | linear regression |
| Binary | Bernoulli | logit | logistic regression |
| Counts | Poisson | log | event-rate modelling |
| Positive continuous | Gamma | inverse | duration, claim sizes |
| Proportions | Binomial | logit | bounded counts |
The strength of the GLM framework is that all of these fit with the same algorithm and share the same theoretical machinery.
Maximum likelihood and IRLS
For the canonical link, the log-likelihood is concave and the gradient has the OLS-like form
The Hessian is
where
Deviance — the GLM loss
The natural goodness-of-fit measure is the deviance:
where the saturated model fits each observation perfectly. For Gaussian responses, deviance reduces to the residual sum of squares. For Bernoulli, it is twice the binary cross-entropy. Deviance is the right "loss" to minimise within the GLM framework — for non-Gaussian responses, plain MSE is biased.
Why GLMs matter today
Three reasons:
- Insurance, epidemiology, social science — count and rate data is everywhere; Poisson and negative-binomial GLMs are the standard.
- Interpretability — coefficients have clean meaning (multiplicative effect on the mean for log-link models, odds ratio for logit), which matters in regulated domains.
- Conceptual link to deep learning — the final layer of many modern networks is a GLM in disguise. Choosing the right output activation and loss is choosing the right (link, distribution) pair.
For the deep-learning practitioner, GLMs are the correct mental model for what your output head should be:
- Real-valued target → linear output + MSE (Gaussian GLM).
- Binary target → sigmoid + BCE (Bernoulli GLM).
- Categorical target → softmax + cross-entropy (multinomial GLM).
- Count target → exp output + Poisson NLL (Poisson GLM).
- Positive continuous → exp output + gamma NLL.
Picking the right combination matters more than tweaking the network body.
What to read next
- Logistic Regression — the most-used GLM.
- Ordinary Least Squares — the Gaussian-identity GLM.
- Loss Functions — deep-learning loss choices in GLM language.