Skip to content

Linear Algebra Recap

Almost every operation in machine learning is a matrix multiplication. This page is a fast tour of the linear-algebra concepts that recur throughout the rest of the curriculum: vector spaces, the four fundamental subspaces, matrix factorisations (eigendecomposition, SVD), and the geometric interpretations that let you read matrix expressions as transformations rather than indices.

Vectors and inner products

Vectors xRn have a norm x2=xi2 and an inner product x,y=xy. The geometric reading: x,y=xycosθ. Two vectors are orthogonal when x,y=0.

A basis of Rn is a set of n linearly independent vectors; an orthonormal basis has unit-norm pairwise-orthogonal vectors. Coordinates in an orthonormal basis are inner products: xi=x,ei.

Matrices as linear maps

A matrix ARm×n is a linear map RnRm. Two ways to read Ax:

  • Column viewAx=jxjaj is a linear combination of A's columns.
  • Row view(Ax)i=ai,x is the inner product of the i-th row with x.

Matrix multiplication AB composes the maps. Reading dimensions: if A is m×n and B is n×p, then AB is m×p. The shared dimension n contracts.

The four fundamental subspaces

Every ARm×n defines four subspaces:

  • Column space Col(A)Rm — span of the columns.
  • Row space Row(A)=Col(A)Rn.
  • Null space Null(A)={x:Ax=0}Rn.
  • Left null space Null(A)Rm.

Two orthogonality relations: Row(A)Null(A) and Col(A)Null(A). The rank-nullity theorem rank(A)+dimNull(A)=n ties them together. These four subspaces are what regression, projection, least squares, and PCA all manipulate.

Eigendecomposition

For square ARn×n, an eigenvector v satisfies Av=λv for some scalar eigenvalue λ. If A has n linearly independent eigenvectors, it factorises as

A=VΛV1,

with V the matrix of eigenvectors and Λ=diag(λ1,,λn). Computing Ak is then trivial: Ak=VΛkV1.

For symmetric A, the spectral theorem gives orthonormal eigenvectors and real eigenvalues, so V is orthogonal: A=VΛV. Symmetric positive-(semi)definite matrices have non-negative eigenvalues — Hessians, covariances, kernels, Gram matrices all live in this class.

Singular Value Decomposition

For any ARm×n,

A=UΣV,

where URm×m, VRn×n are orthogonal and ΣRm×n is diagonal with non-negative entries (singular values) in decreasing order. SVD is the universal matrix factorisation — it always exists, generalises eigendecomposition to non-square matrices, and is the foundation of:

  • PCA — principal components are right singular vectors of the centred data matrix (see PCA & SVD).
  • Pseudo-inverseA+=VΣ+U, the right thing to use for over-/under-determined least squares.
  • Low-rank approximationEckart–Young theorem: the best rank-k approximation is UkΣkVk (top-k singular triplets).
  • Numerical conditioningσmax/σmin is the condition number; large values mean inversion is unstable.

SVD is the one matrix factorisation worth being fluent in.

Norms and conditioning

For a matrix A:

  • Frobenius norm AF=ijAij2=iσi2.
  • Spectral norm A2=σmax.
  • Nuclear norm A=iσi — the convex envelope of rank.

Each plays a different role in regularisation: Frobenius for ridge, nuclear for low-rank, spectral for stability constraints (e.g., Lipschitz networks, WGAN).

Released under the MIT License. Content imported and adapted from NoteNextra.