Two-View Geometry & Stereo
Two cameras observing the same scene from different positions impose a strong geometric constraint on what they can see. Working out that constraint — the epipolar geometry — is what lets us triangulate 3D points from pairs of image observations. Stereo vision is the direct application: a calibrated pair of cameras produces a dense disparity map that converts to depth.
Epipolar geometry and the fundamental matrix
For two views with camera matrices
where
When the cameras are calibrated (intrinsics
Estimating : the eight-point algorithm
Hartley's normalised eight-point algorithm (PAMI 1997) solves for
- Normalise image coordinates so both views are zero-mean with average distance
. Skipping this is the classical foot-gun — the linear system becomes wildly ill-conditioned. - Stack the constraint
for each correspondence into a linear system where is the 9 entries of . - Solve via SVD; enforce the rank-2 constraint by zeroing the smallest singular value of the resulting
. - Wrap in RANSAC — outliers are inevitable in feature matching, so this is non-negotiable in practice.
Rectification
Once
Disparity and depth
In a rectified stereo pair with baseline
Closer points have larger disparity. Estimating the disparity at every pixel is the stereo matching problem: methods range from local block matching (window correlation, SAD/NCC) through Semi-Global Matching (SGM, Hirschmüller, CVPR 2005) — the workhorse of OpenCV's StereoSGBM — to learned cost-volume networks (PSMNet, GA-Net, RAFT-Stereo).
What to read next
- Camera Models & Calibration — needed before stereo geometry can yield metric depth.
- Optical Flow — dense correspondence in time rather than across cameras.
- Correspondence & SfM — generalising two-view geometry to many views.