Learning Paradigms

Machine-learning settings are usually classified by what signal the learner receives. Supervised, unsupervised, and reinforcement learning are the three classical categories, but modern ML increasingly blends them — self-supervised learning sits between supervised and unsupervised; offline RL between supervised and reinforcement; semi-supervised between supervised and unsupervised. This page is the map.

Supervised learning

Each training example is a pair $(x, y)$ — input and target label. The learner finds $h : X \to Y$ via empirical risk minimization. Two flavours:

Classification — discrete $y$ . Loss: cross-entropy or hinge.
Regression — continuous $y$ . Loss: squared error or Huber.

Almost every named ML algorithm lives here: OLS, logistic regression, SVM, decision trees, CNNs, Transformers fine-tuned on labelled data. The label is what the learner is optimising for.

Unsupervised learning

Examples are unlabelled — just a set ${x_{1}, \dots, x_{N}}$ . The learner discovers structure: clusters (k-means, GMM, hierarchical), low-dimensional representations (PCA, t-SNE, UMAP), density estimates (KDE, GMM, normalising flows), or generative models (VAE, GAN, diffusion).

The key tension: what is structure? Without labels there is no objective ground truth — every choice of method encodes a different prior about what "structure" means. PCA finds high-variance directions; k-means finds spherical clusters; t-SNE preserves local neighbourhoods. Each can be the wrong tool for a given task.

Reinforcement learning

The learner is an agent that interacts with an environment, receiving rewards. See the RL track for full coverage. The learning signal is the reward — sparser than supervised labels, harder to optimise (the agent must explore), but applicable to sequential-decision problems supervised learning cannot handle.

Self-supervised learning

The dominant paradigm of modern foundation-model pretraining. Generate "labels" from the data itself:

Masked language modelling (BERT) — predict masked words from context.
Next-token prediction (GPT) — predict the next word from preceding context.
Contrastive learning (SimCLR, CLIP) — predict whether two views of the same image (or image-caption pair) match.
Masked autoencoding (MAE) — reconstruct masked image patches.

Self-supervised methods get effectively unlimited supervision because the label is computable from the input. They are structurally supervised (defined ERM problem with a loss) but semantically unsupervised (no human labelling). This is the recipe behind every modern LLM, foundation vision model, and multimodal pretraining.

Semi-supervised learning

A mix: a small labelled set $L$ and a large unlabelled set $U$ . Methods exploit $U$ to improve a model trained primarily on $L$ :

Pseudo-labelling — train on $L$ , predict labels for $U$ , retrain on the union with pseudo-labels.
Consistency regularisation — augment unlabelled examples and require the model to make consistent predictions across augmentations.
MixMatch / FixMatch (Sohn et al., 2020) — combine the two with strong augmentation.

In the foundation-model era, semi-supervised learning is largely subsumed by self-supervised pretraining + supervised fine-tuning — the pretrained model already encodes most of what semi-supervision used to provide.

Offline / batch learning vs online

Orthogonal axis: does the learner see all data up front, or stream it?

Offline — full dataset available. The default in this curriculum.
Online — data arrives one example at a time; model must adapt incrementally. Streaming, time-series, recommender systems.
Active learning — learner chooses which examples get labelled, querying an oracle. Useful when labels are expensive (medical imaging, expert annotation).

Offline RL is a special case where the agent cannot interact further and must learn purely from a logged dataset.

Transfer, multi-task, and meta-learning

Three related ideas about reusing learning across problems:

Transfer learning — pretrain on one task, fine-tune on another. The standard recipe in NLP and vision.
Multi-task learning — train one model on many tasks simultaneously, sharing parameters. T5 made this the default for NLP.
Meta-learning — learn how to learn from a distribution of tasks. MAML (Finn et al., 2017) is the reference; modern in-context learning in LLMs is meta-learning emerged from scale.

Choosing a paradigm

The task usually picks for you:

Have abundant labels? Supervised.
Want to discover structure? Unsupervised.
Sequential decisions with reward? Reinforcement.
Cheap data + expensive labels? Self-supervised pretraining + supervised fine-tuning, in that order.

Learning Paradigms ​

Supervised learning ​

Unsupervised learning ​

Reinforcement learning ​

Self-supervised learning ​

Semi-supervised learning ​

Offline / batch learning vs online ​

Transfer, multi-task, and meta-learning ​

Choosing a paradigm ​

What to read next ​