Linear Filters & Convolution
A linear filter is a small array of weights ("kernel") swept across an image, replacing each pixel with a weighted combination of its neighbourhood. This is the workhorse operation of classical computer vision and the structural template for the convolutional neural network. The two ideas to internalise are convolution as a linear operator and the Fourier-domain interpretation that makes large kernels practical.
Convolution and correlation
Given an image
Cross-correlation is the same with a sign flip (
Convolution is linear and translation-equivariant: the response of
Smoothing: box and Gaussian
The two canonical low-pass filters:
- Box filter — uniform
weights over a window. Cheap (separable, runnable as integral image), but its frequency response has ringing. - Gaussian filter — kernel
. Optimal joint localisation in space and frequency (uncertainty principle), separable into 1D Gaussians, and the unique smoothing kernel for which the pyramid construction is mathematically well-posed.
Smoothing is what makes downstream operations (gradients, edges, feature detection) numerically sane — derivatives of unsmoothed images are dominated by sensor noise.
Differentiation: Sobel, Scharr, Laplacian
Image derivatives are computed by convolution with derivative-of-Gaussian or fixed difference kernels:
Sobel/Scharr give first derivatives (
Frequency-domain view
The Fourier transform diagonalises convolution:
- Asymptotic complexity — for a kernel of size
on an image, direct convolution is while FFT-based convolution is , independent of . The crossover is around for typical implementations. - Filter design — designing a kernel becomes shaping its frequency response. Low-pass (Gaussian, box), high-pass (Laplacian, unsharp mask), and band-pass (DoG, Gabor) filters are all characterised by their amplitude spectrum.
Non-linear extension: bilateral filter
Linear filters that smooth also blur edges, which is often unwanted. The bilateral filter (Tomasi & Manduchi, 1998) makes the kernel weights depend on intensity similarity as well as spatial distance:
The result is edge-preserving smoothing — the precursor to non-local means, BM3D, and (eventually) attention itself.
What to read next
- Edges & Corners — the immediate consumer of gradients.
- Pyramids & Scale-Space — repeated Gaussian smoothing builds multi-scale representations.
- Convolution (Deep) — convolution as a learnable layer.