Image Formation & Cameras

An image is the result of light reflecting off scene surfaces, traveling through a lens, and being sampled by a 2D sensor. Computer vision starts from the geometry and radiometry of that process — the mapping from a 3D world point to a pixel intensity. Every later module (calibration, stereo, SfM) reads off this model.

Pinhole projection

The simplest camera is the pinhole: a single small aperture and a planar image sensor at focal distance $f$ behind it. A 3D point $X = (X, Y, Z)$ in the camera frame projects to the image plane via similar triangles:

x = f \frac{X}{Z}, y = f \frac{Y}{Z} .

In homogeneous coordinates this is a linear map. Stacked with intrinsics (focal length, principal point, pixel scaling) and extrinsics (rotation $R$ , translation $t$ ), the full projection is the camera matrix $P = K [R ∣ t]$ giving $x \sim P X$ .

The pinhole model is the working substrate of nearly all multi-view geometry — it is exactly invertible up to depth and the source of every "lift a 2D point to a 3D ray" operation.

Intrinsic and extrinsic parameters

Intrinsics $K$ — focal lengths $(f_{x}, f_{y})$ , principal point $(c_{x}, c_{y})$ , optionally a skew term. They depend only on the camera body + lens combination.
Extrinsics $(R, t)$ — the rigid transform from world coordinates into the camera's coordinate frame.

K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}], P = K [\begin{matrix} R & t \end{matrix}] .

Intrinsics are recovered by calibration; extrinsics are estimated per-image during pose estimation, SLAM, or SfM.

Lens distortion

Real lenses deviate from the pinhole. The two dominant components are radial distortion (barrel/pincushion warping that depends on distance from the optical centre) and tangential distortion (slight lens decentering). The Brown–Conrady model is the standard:

\begin{aligned} x_{d} & = x (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + 2 p_{1} x y + p_{2} (r^{2} + 2 x^{2}), \\ y_{d} & = y (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + p_{1} (r^{2} + 2 y^{2}) + 2 p_{2} x y, \end{aligned}

with $r^{2} = x^{2} + y^{2}$ . Modelling and undistorting images is a prerequisite for any geometry-based downstream task.

Radiometry: how intensity is formed

Pixel value depends on the irradiance hitting the sensor, which depends on scene radiance, surface BRDF, lighting, exposure, lens vignetting, and sensor response. The simplest reasonable model is the image irradiance equation

E = L \cdot \frac{π}{4} {(\frac{d}{f})}^{2} \cos^{4} α,

where $L$ is scene radiance, $d$ is aperture diameter, $f$ is focal length, and $α$ is the angle from the optical axis. The $\cos^{4}$ term is the natural source of vignetting (darker corners). Beyond geometry, sensors apply gamma correction and quantisation, which is what classical and learned methods alike must remain robust to.

Image Formation & Cameras ​

Pinhole projection ​

Intrinsic and extrinsic parameters ​

Lens distortion ​

Radiometry: how intensity is formed ​

What to read next ​

Image Formation & Cameras

Pinhole projection

Intrinsic and extrinsic parameters

Lens distortion

Radiometry: how intensity is formed

What to read next