k-Nearest Neighbors

k-Nearest Neighbors is the simplest learning algorithm — and one of the most stubborn baselines. To classify a query point, find its $k$ closest training examples by some distance metric and take a vote. To regress, average their targets. There is no training in the usual sense; the entire training set is the model.

The algorithm

Given a training set ${(x_{i}, y_{i})}$ and a query $x$ :

Compute the distance $d (x, x_{i})$ for every training point.
Pick the $k$ nearest.
Classification: return the majority label.
Regression: return the (optionally weighted) average of $y_{i}$ values.

The choices that matter:

The distance metric $d$ — Euclidean is the default; Manhattan, cosine, Mahalanobis, or learned distances all see use.
The number of neighbours $k$ — small $k$ gives high variance and low bias; large $k$ smooths predictions toward the global average.
Weighting by $1 / d$ — closer neighbours get more influence than distant ones.

Bias-variance behaviour

kNN is the canonical example of the bias-variance trade-off:

$k = 1$ : zero training error, very flexible, high variance. Decision boundary is the Voronoi diagram of the training set.
$k = N$ : predicts the global mean (or majority) for every query — maximum bias, zero variance.
Optimal $k$ : somewhere in between, found by cross-validation.

Theoretical bound (Cover, Hart, 1967): the asymptotic error of 1-NN is at most twice the Bayes-optimal error, regardless of the metric. This is a remarkable guarantee — without any model assumptions, 1-NN is competitive with any classifier in the limit of infinite data.

The curse of dimensionality

In $d$ -dimensional spaces with large $d$ , things break:

Distances concentrate. For random points in high dimensions, the nearest and farthest neighbours are at almost the same distance. The "nearest neighbour" stops being informative.
Volume grows exponentially. Filling a $d$ -dimensional unit ball with samples of density $\geq ρ$ requires $\sim ρ^{d}$ samples — exponentially many.
Most of the volume is at the surface. In high $d$ , almost all of a ball's volume sits near its surface, not its centre.

The practical consequences: kNN works well in low/medium dimensions ( $d < 50$ ) and degrades rapidly past that. PCA pre-projection or learned embeddings before kNN are common fixes — match in a lower-dimensional space where distances are meaningful.

Computational cost

Naive kNN inference is $O (N d)$ per query — every query touches every training point. For modern datasets this is too slow. Standard accelerators:

kd-trees — binary trees that recursively split feature space. Effective up to $d \approx 20$ , then degrade to brute force.
Ball trees — analogous structure with hypersphere bounds; better than kd-trees in moderate dimensions.
Approximate nearest neighbour (ANN) — algorithms like FAISS, ScaNN, HNSW that trade exact correctness for sub-linear query time. The standard tool for billion-vector retrieval.

For modern systems, ANN is what matters: every retrieval-augmented LLM (RAG), embedding-based recommender, and image-search backend is fundamentally a kNN system over a learned embedding space.

When kNN wins

kNN is rarely the best classifier on benchmarks but consistently strong as a baseline:

Low-dimensional, modest data — competitive without tuning.
Local structure matters — when the function changes character across the input space, kNN naturally adapts. Linear models cannot.
Retrieval / embedding-based systems — kNN over learned embeddings is the workhorse of modern recommendation, semantic search, and RAG.

k-Nearest Neighbors ​

The algorithm ​

Bias-variance behaviour ​

The curse of dimensionality ​

Computational cost ​

When kNN wins ​

What to read next ​