Image Formation & Cameras
An image is the result of light reflecting off scene surfaces, traveling through a lens, and being sampled by a 2D sensor. Computer vision starts from the geometry and radiometry of that process — the mapping from a 3D world point to a pixel intensity. Every later module (calibration, stereo, SfM) reads off this model.
Pinhole projection
The simplest camera is the pinhole: a single small aperture and a planar image sensor at focal distance
In homogeneous coordinates this is a linear map. Stacked with intrinsics (focal length, principal point, pixel scaling) and extrinsics (rotation
The pinhole model is the working substrate of nearly all multi-view geometry — it is exactly invertible up to depth and the source of every "lift a 2D point to a 3D ray" operation.
Intrinsic and extrinsic parameters
- Intrinsics
— focal lengths , principal point , optionally a skew term. They depend only on the camera body + lens combination. - Extrinsics
— the rigid transform from world coordinates into the camera's coordinate frame.
Intrinsics are recovered by calibration; extrinsics are estimated per-image during pose estimation, SLAM, or SfM.
Lens distortion
Real lenses deviate from the pinhole. The two dominant components are radial distortion (barrel/pincushion warping that depends on distance from the optical centre) and tangential distortion (slight lens decentering). The Brown–Conrady model is the standard:
with
Radiometry: how intensity is formed
Pixel value depends on the irradiance hitting the sensor, which depends on scene radiance, surface BRDF, lighting, exposure, lens vignetting, and sensor response. The simplest reasonable model is the image irradiance equation
where
What to read next
- Filters & Convolution — the next layer of the foundations stack: how images are smoothed, sharpened, and differentiated.
- Camera Calibration — recovering
and the distortion coefficients from images of known patterns. - Stereo & Multi-view — projecting a single point with two cameras to recover depth.