Optical Flow

Optical flow is the apparent 2D motion of pixels between two consecutive video frames. Computing it dense-per-pixel is the temporal counterpart of stereo disparity — both are correspondence problems, but flow operates across time on a single moving camera (or a moving scene), and the displacements can be in any direction, not just horizontal.

The brightness constancy assumption

The classical formulation assumes that a tracked point keeps the same intensity across frames:

I (x, y, t) = I (x + u, y + v, t + 1) .

A first-order Taylor expansion gives the optical flow constraint equation:

I_{x} u + I_{y} v + I_{t} = 0.

A single equation in two unknowns — this is the famous aperture problem. Locally you can recover only the component of motion normal to the gradient; the tangential component is unobservable from one pixel alone. All flow algorithms add some regularisation or larger-window constraint to break the ambiguity.

Lucas–Kanade

An Iterative Image Registration Technique (Lucas, Kanade, IJCAI 1981) takes the windowed-constancy approach: assume flow is constant over a small neighbourhood $W$ , giving a linear system

[\begin{matrix} \sum I_{x}^{2} & \sum I_{x} I_{y} \\ \sum I_{x} I_{y} & \sum I_{y}^{2} \end{matrix}] [\begin{matrix} u \\ v \end{matrix}] = - [\begin{matrix} \sum I_{x} I_{t} \\ \sum I_{y} I_{t} \end{matrix}] .

The $2 \times 2$ matrix on the left is the same second-moment matrix that appears in Harris corner detection: well-conditioned (two large eigenvalues) means a corner, which means a tractable flow estimate. Lucas–Kanade in a pyramid (coarse-to-fine warping) handles displacements larger than the window.

Horn–Schunck — global smoothness

Determining Optical Flow (Horn, Schunck, AI 1981) takes the opposite approach: add a global smoothness regulariser and solve a variational problem,

min_{u, v} \int {(I_{x} u + I_{y} v + I_{t})}^{2} + λ (∥ \nabla u ∥^{2} + ∥ \nabla v ∥^{2}) d x d y .

The Euler–Lagrange equations give a coupled linear system solved by Gauss–Seidel or successive over-relaxation. Horn–Schunck produces dense flow everywhere — at the cost of over-smoothing motion discontinuities at object boundaries.

TV-L1 and modern variational flow

Replacing the L2 data and smoothness terms with L1 (or robust Huber/Charbonnier) penalties makes the estimator robust to brightness violations and edge-preserving across motion boundaries. TV-L1 Optical Flow (Zach et al., DAGM 2007) is the canonical formulation, with primal-dual solvers and GPU implementations that ran near-real-time well before deep methods. The principle — robust data term, total-variation regularisation — survived into many learned methods as auxiliary losses.

Deep era: FlowNet, PWC-Net, RAFT

FlowNet (Dosovitskiy et al., ICCV 2015) was the first end-to-end CNN for flow, supervised on synthetic data. PWC-Net (Sun et al., CVPR 2018) added the classical pyramid-warping-cost-volume structure inside the network. RAFT (Teed, Deng, ECCV 2020) — covered in SfM — is the dominant modern approach: build a 4D all-pairs correlation volume once, then iteratively refine flow with a recurrent update operator. Modern SLAM, video segmentation, and frame interpolation systems all rest on RAFT-style flow.

Optical Flow ​

The brightness constancy assumption ​

Lucas–Kanade ​

Horn–Schunck — global smoothness ​

TV-L1 and modern variational flow ​

Deep era: FlowNet, PWC-Net, RAFT ​

What to read next ​