Linear Algebra Recap

Almost every operation in machine learning is a matrix multiplication. This page is a fast tour of the linear-algebra concepts that recur throughout the rest of the curriculum: vector spaces, the four fundamental subspaces, matrix factorisations (eigendecomposition, SVD), and the geometric interpretations that let you read matrix expressions as transformations rather than indices.

Vectors and inner products

Vectors $x \in R^{n}$ have a norm $∥ x ∥_{2} = \sqrt{\sum x_{i}^{2}}$ and an inner product $⟨ x, y ⟩ = x^{⊤} y$ . The geometric reading: $⟨ x, y ⟩ = ∥ x ∥ ∥ y ∥ \cos θ$ . Two vectors are orthogonal when $⟨ x, y ⟩ = 0$ .

A basis of $R^{n}$ is a set of $n$ linearly independent vectors; an orthonormal basis has unit-norm pairwise-orthogonal vectors. Coordinates in an orthonormal basis are inner products: $x_{i} = ⟨ x, e_{i} ⟩$ .

Matrices as linear maps

A matrix $A \in R^{m \times n}$ is a linear map $R^{n} \to R^{m}$ . Two ways to read $A x$ :

Column view — $A x = \sum_{j} x_{j} a_{j}$ is a linear combination of $A$ 's columns.
Row view — $(A x)_{i} = ⟨ a_{i}^{⊤}, x ⟩$ is the inner product of the $i$ -th row with $x$ .

Matrix multiplication $A B$ composes the maps. Reading dimensions: if $A$ is $m \times n$ and $B$ is $n \times p$ , then $A B$ is $m \times p$ . The shared dimension $n$ contracts.

The four fundamental subspaces

Every $A \in R^{m \times n}$ defines four subspaces:

Column space $Col (A) \subseteq R^{m}$ — span of the columns.
Row space $Row (A) = Col (A^{⊤}) \subseteq R^{n}$ .
Null space $Null (A) = {x : A x = 0} \subseteq R^{n}$ .
Left null space $Null (A^{⊤}) \subseteq R^{m}$ .

Two orthogonality relations: $Row (A) ⊥ Null (A)$ and $Col (A) ⊥ Null (A^{⊤})$ . The rank-nullity theorem $rank (A) + \dim Null (A) = n$ ties them together. These four subspaces are what regression, projection, least squares, and PCA all manipulate.

Eigendecomposition

For square $A \in R^{n \times n}$ , an eigenvector $v$ satisfies $A v = λ v$ for some scalar eigenvalue $λ$ . If $A$ has $n$ linearly independent eigenvectors, it factorises as

A = V Λ V^{- 1},

with $V$ the matrix of eigenvectors and $Λ = diag (λ_{1}, \dots, λ_{n})$ . Computing $A^{k}$ is then trivial: $A^{k} = V Λ^{k} V^{- 1}$ .

For symmetric $A$ , the spectral theorem gives orthonormal eigenvectors and real eigenvalues, so $V$ is orthogonal: $A = V Λ V^{⊤}$ . Symmetric positive-(semi)definite matrices have non-negative eigenvalues — Hessians, covariances, kernels, Gram matrices all live in this class.

Singular Value Decomposition

For any $A \in R^{m \times n}$ ,

A = U Σ V^{⊤},

where $U \in R^{m \times m}$ , $V \in R^{n \times n}$ are orthogonal and $Σ \in R^{m \times n}$ is diagonal with non-negative entries (singular values) in decreasing order. SVD is the universal matrix factorisation — it always exists, generalises eigendecomposition to non-square matrices, and is the foundation of:

PCA — principal components are right singular vectors of the centred data matrix (see PCA & SVD).
Pseudo-inverse — $A^{+} = V Σ^{+} U^{⊤}$ , the right thing to use for over-/under-determined least squares.
Low-rank approximation — Eckart–Young theorem: the best rank- $k$ approximation is $U_{k} Σ_{k} V_{k}^{⊤}$ (top- $k$ singular triplets).
Numerical conditioning — $σ_{max} / σ_{min}$ is the condition number; large values mean inversion is unstable.

SVD is the one matrix factorisation worth being fluent in.

Norms and conditioning

For a matrix $A$ :

Frobenius norm $∥ A ∥_{F} = \sqrt{\sum_{i j} A_{i j}^{2}} = \sqrt{\sum_{i} σ_{i}^{2}}$ .
Spectral norm $∥ A ∥_{2} = σ_{max}$ .
Nuclear norm $∥ A ∥_{*} = \sum_{i} σ_{i}$ — the convex envelope of rank.

Each plays a different role in regularisation: Frobenius for ridge, nuclear for low-rank, spectral for stability constraints (e.g., Lipschitz networks, WGAN).

Linear Algebra Recap ​

Vectors and inner products ​

Matrices as linear maps ​

The four fundamental subspaces ​

Eigendecomposition ​

Singular Value Decomposition ​

Norms and conditioning ​

What to read next ​