k-Nearest Neighbors
k-Nearest Neighbors is the simplest learning algorithm — and one of the most stubborn baselines. To classify a query point, find its
The algorithm
Given a training set
- Compute the distance
for every training point. - Pick the
nearest. - Classification: return the majority label.
- Regression: return the (optionally weighted) average of
values.
The choices that matter:
- The distance metric
— Euclidean is the default; Manhattan, cosine, Mahalanobis, or learned distances all see use. - The number of neighbours
— small gives high variance and low bias; large smooths predictions toward the global average. - Weighting by
— closer neighbours get more influence than distant ones.
Bias-variance behaviour
kNN is the canonical example of the bias-variance trade-off:
: zero training error, very flexible, high variance. Decision boundary is the Voronoi diagram of the training set. : predicts the global mean (or majority) for every query — maximum bias, zero variance. - Optimal
: somewhere in between, found by cross-validation.
Theoretical bound (Cover, Hart, 1967): the asymptotic error of 1-NN is at most twice the Bayes-optimal error, regardless of the metric. This is a remarkable guarantee — without any model assumptions, 1-NN is competitive with any classifier in the limit of infinite data.
The curse of dimensionality
In
- Distances concentrate. For random points in high dimensions, the nearest and farthest neighbours are at almost the same distance. The "nearest neighbour" stops being informative.
- Volume grows exponentially. Filling a
-dimensional unit ball with samples of density requires samples — exponentially many. - Most of the volume is at the surface. In high
, almost all of a ball's volume sits near its surface, not its centre.
The practical consequences: kNN works well in low/medium dimensions (
Computational cost
Naive kNN inference is
- kd-trees — binary trees that recursively split feature space. Effective up to
, then degrade to brute force. - Ball trees — analogous structure with hypersphere bounds; better than kd-trees in moderate dimensions.
- Approximate nearest neighbour (ANN) — algorithms like FAISS, ScaNN, HNSW that trade exact correctness for sub-linear query time. The standard tool for billion-vector retrieval.
For modern systems, ANN is what matters: every retrieval-augmented LLM (RAG), embedding-based recommender, and image-search backend is fundamentally a kNN system over a learned embedding space.
When kNN wins
kNN is rarely the best classifier on benchmarks but consistently strong as a baseline:
- Low-dimensional, modest data — competitive without tuning.
- Local structure matters — when the function changes character across the input space, kNN naturally adapts. Linear models cannot.
- Retrieval / embedding-based systems — kNN over learned embeddings is the workhorse of modern recommendation, semantic search, and RAG.
What to read next
- Manifold Learning — t-SNE / UMAP rely on local neighbour graphs.
- PCA & SVD — the standard pre-projection for kNN in high
. - RAG — kNN over text embeddings is the retriever.