Image Pyramids & Scale Space
A real scene contains structure at many sizes — a face is a few hundred pixels close up and a few dozen at a distance. Detection and matching algorithms need to fire at the right scale. The classical answer is the pyramid, a stack of progressively smoothed and downsampled copies of an image; the formal answer is scale-space theory, which singles out the Gaussian as the unique kernel that builds a pyramid without introducing artefacts.
Gaussian and Laplacian pyramids
The Gaussian pyramid (Burt & Adelson, 1983) is built by alternating Gaussian smoothing and 2× downsampling. Level
with
The Laplacian pyramid stores the differences between adjacent Gaussian levels, giving a band-pass decomposition that reconstructs the image exactly when summed back. It is the foundation of multi-resolution image blending — used in panorama stitching, exposure fusion, and image inpainting — and a precursor to the wavelet transform.
Scale-space and the uniqueness of the Gaussian
Lindeberg's Scale-Space Theory in Computer Vision (1994) formalises what "smoothing" should mean. A scale-space representation
The Gaussian's role across CV — from filters to SIFT to learned scale-space embeddings — flows from this uniqueness result.
Difference of Gaussians and SIFT detection
The DoG operator approximates the scale-normalised Laplacian:
with
Pyramid pooling in deep nets
The pyramid idea persists into the deep era: Spatial Pyramid Pooling (He et al., 2014) pools CNN features at multiple grid resolutions to give scale-invariant classification heads; Feature Pyramid Networks (Lin et al., CVPR 2017) build a top-down + lateral pathway that produces semantically rich features at every spatial scale — now standard in modern detectors and segmenters. The classical theory is what tells you the right axis to vary, even when the smoothing is no longer literally Gaussian.
What to read next
- Linear Filters & Convolution — Gaussian smoothing is the building block.
- Local Feature Descriptors — SIFT detection is built on DoG scale-space.
- Edges & Corners — multi-scale extension of derivative operators.