The Kernel Era (1995–2010)
For about 15 years, kernel methods were the dominant paradigm in machine learning. SVMs, kernel ridge regression, kernel PCA, Gaussian processes — all built on the same idea of replacing inner products with kernels — were the state-of-the-art on most benchmarks from 1995 to 2010, and the conceptual centre of academic ML. Then deep learning displaced them, but the era's ideas, vocabulary, and pedagogical clarity survive.
What started it
Three papers in 1992 set the stage:
- Boser, Guyon, Vapnik — A Training Algorithm for Optimal Margin Classifiers — introduced the SVM and the kernel trick.
- Schölkopf et al. — Comparing support vector machines with Gaussian kernels to radial basis function classifiers — showed RBF-kernel SVMs beat hand-crafted RBF networks.
- Cortes & Vapnik (1995) — Support Vector Networks — the soft-margin SVM, removing the linear-separability requirement.
By 1998, the Kernel Methods workshop was a regular fixture at NeurIPS. By 2002, Schölkopf and Smola's Learning with Kernels was the canonical textbook.
The core insight
Once you've reformulated an algorithm to depend only on inner products
This applies to:
- Classification (SVM).
- Regression (kernel ridge, Gaussian processes).
- Clustering (kernel k-means, spectral clustering).
- Dimensionality reduction (kernel PCA).
- Anomaly detection (one-class SVM).
- Independence testing (HSIC, kernel two-sample tests).
A whole sub-field formed around designing kernels for structured objects — strings, trees, graphs, sets, sequences — while leaving the core algorithm unchanged.
Theoretical legitimacy
Kernel methods came with statistical-learning-theory guarantees:
- Convex optimisation — global optima, polynomial-time training.
- Margin-based generalisation bounds — the wider the margin, the better the generalisation.
- VC theory — finite VC dimension despite infinite feature space (when properly regularised).
For an academic field that valued mathematical rigour, this was deeply attractive. SVMs were a theorem-friendly algorithm in a way neural networks of the era were not.
Practical strengths
- Strong out-of-the-box performance on small/medium structured data — text categorisation, gene-expression classification, hand-written digit recognition, OCR.
- Limited tuning required — the bandwidth hyperparameter and regularisation strength are usually all you need to cross-validate.
- Mature software ecosystem — libsvm, libsvm-derived kernels in R/Python.
- Modular — pick a kernel for your data type, plug into any kernel algorithm.
What killed it
Two things, simultaneously:
- Scaling. The Gram matrix is
. Past , both memory and per-query cost become prohibitive. Approximations (Nyström, random Fourier features, sparse SVMs) introduced their own errors. - Representation learning. Kernels encode a fixed feature space chosen by the practitioner. Deep networks learn their feature space from data — and on tasks with abundant data and natural structure (vision, audio, NLP), the learned features dominate any hand-chosen kernel.
The 2012 ImageNet result was the public turning point. By 2014, every major vision conference paper used CNNs; by 2018, NLP had fully transitioned. Kernel methods retreated to small-data, structured-input, or theory-of-deep-learning niches.
What survived
- As a baseline — RBF SVMs are still a strong baseline in tabular and small-data settings.
- Gaussian processes — Bayesian regression with kernel covariance, used in Bayesian optimisation, robotics, hyperparameter tuning.
- Spectral methods — kernel PCA, spectral clustering, manifold learning, Laplacian eigenmaps.
- The Neural Tangent Kernel — wide neural networks behave like kernel methods (Jacot et al., 2018), giving a bridge between the two paradigms.
The kernel era's pedagogical legacy is durable: kernels remain the cleanest way to teach high-dimensional generalisation, the role of regularisation, and the connection between optimisation and statistics.
What to read next
- Kernel Methods — the technical core.
- SVM — the era's flagship algorithm.
- Deep Learning Renaissance — what came next.