Camera Models & Calibration

Calibration recovers the intrinsic parameters $K$ and the lens-distortion coefficients of a camera — the constants that any later geometric reasoning (stereo, optical flow, SfM) needs in order to map between pixels and rays. The page assumes the image-formation pinhole model and walks through the standard procedure.

What we want to recover

For a single camera under the pinhole + Brown–Conrady model, calibration solves for:

Focal lengths $(f_{x}, f_{y})$ , principal point $(c_{x}, c_{y})$ , and (rarely) skew $s$ — the five entries of $K$ .
Radial distortion $(k_{1}, k_{2}, k_{3})$ and tangential distortion $(p_{1}, p_{2})$ .
Implicitly, the per-image extrinsics $(R_{i}, t_{i})$ used during the calibration capture.

Zhang's method

A Flexible New Technique for Camera Calibration (Zhang, PAMI 2000) is the standard procedure used by every CV library. Capture multiple views ( $\geq 3$ ) of a planar calibration pattern (a checkerboard) at varying poses. For each view:

Detect the corners of the checkerboard (sub-pixel refined via the gradient response).
Compute the homography $H_{i}$ relating the planar pattern to the image, using the known board geometry.
Each $H_{i}$ provides two constraints on the image of the absolute conic $ω = K^{- T} K^{- 1}$ , a $3 \times 3$ symmetric matrix with five degrees of freedom.

Stack the constraints across views, solve linearly for $ω$ , recover $K$ via Cholesky, then refine $(K, {R_{i}, t_{i}}, k, p)$ jointly by minimising reprojection error over all detected corners with Levenberg–Marquardt:

min_{K, k, p, {R_{i}, t_{i}}} \sum_{i} \sum_{j} {‖ x_{i j} - π (K, k, p; R_{i}, t_{i}; X_{j}) ‖}^{2} .

The non-linear refinement is what produces the calibration accuracy real applications need (sub-pixel reprojection error).

Self-calibration and online estimation

Calibration with a known pattern is the right baseline; for arbitrary footage it is impossible to insist on a checkerboard. Self-calibration (Faugeras, Hartley) recovers the intrinsics from rigid scene geometry alone, using constraints between fundamental matrices across views. It needs many wide-baseline views and is inherently more brittle than Zhang's method.

In modern systems, intrinsics are often estimated online during SLAM or SfM, starting from a rough guess and treating $K$ as additional unknowns inside bundle adjustment. Learned methods (e.g., Tenenbaum, Camera Calibration with Vanishing Points) regress focal length and distortion directly from a single image, but accuracy lags Zhang's method when a pattern is available.

Multi-camera and rolling shutter extensions

A stereo or multi-camera rig requires recovering the inter-camera extrinsics $(R_{i j}, t_{i j})$ in addition to per-camera intrinsics. Tools like Kalibr handle multi-IMU + multi-camera rigs with time offsets. Rolling-shutter sensors (most CMOS cameras) further require modeling the per-row exposure timing — ignored by classical pinhole calibration but visible as wobble in fast motion.

Camera Models & Calibration ​

What we want to recover ​

Zhang's method ​

Self-calibration and online estimation ​

Multi-camera and rolling shutter extensions ​

What to read next ​

Camera Models & Calibration

What we want to recover

Zhang's method

Self-calibration and online estimation

Multi-camera and rolling shutter extensions

What to read next