Skip to content

Learning Paradigms

Machine-learning settings are usually classified by what signal the learner receives. Supervised, unsupervised, and reinforcement learning are the three classical categories, but modern ML increasingly blends them — self-supervised learning sits between supervised and unsupervised; offline RL between supervised and reinforcement; semi-supervised between supervised and unsupervised. This page is the map.

Supervised learning

Each training example is a pair (x,y) — input and target label. The learner finds h:XY via empirical risk minimization. Two flavours:

  • Classification — discrete y. Loss: cross-entropy or hinge.
  • Regression — continuous y. Loss: squared error or Huber.

Almost every named ML algorithm lives here: OLS, logistic regression, SVM, decision trees, CNNs, Transformers fine-tuned on labelled data. The label is what the learner is optimising for.

Unsupervised learning

Examples are unlabelled — just a set {x1,,xN}. The learner discovers structure: clusters (k-means, GMM, hierarchical), low-dimensional representations (PCA, t-SNE, UMAP), density estimates (KDE, GMM, normalising flows), or generative models (VAE, GAN, diffusion).

The key tension: what is structure? Without labels there is no objective ground truth — every choice of method encodes a different prior about what "structure" means. PCA finds high-variance directions; k-means finds spherical clusters; t-SNE preserves local neighbourhoods. Each can be the wrong tool for a given task.

Reinforcement learning

The learner is an agent that interacts with an environment, receiving rewards. See the RL track for full coverage. The learning signal is the reward — sparser than supervised labels, harder to optimise (the agent must explore), but applicable to sequential-decision problems supervised learning cannot handle.

Self-supervised learning

The dominant paradigm of modern foundation-model pretraining. Generate "labels" from the data itself:

  • Masked language modelling (BERT) — predict masked words from context.
  • Next-token prediction (GPT) — predict the next word from preceding context.
  • Contrastive learning (SimCLR, CLIP) — predict whether two views of the same image (or image-caption pair) match.
  • Masked autoencoding (MAE) — reconstruct masked image patches.

Self-supervised methods get effectively unlimited supervision because the label is computable from the input. They are structurally supervised (defined ERM problem with a loss) but semantically unsupervised (no human labelling). This is the recipe behind every modern LLM, foundation vision model, and multimodal pretraining.

Semi-supervised learning

A mix: a small labelled set L and a large unlabelled set U. Methods exploit U to improve a model trained primarily on L:

  • Pseudo-labelling — train on L, predict labels for U, retrain on the union with pseudo-labels.
  • Consistency regularisation — augment unlabelled examples and require the model to make consistent predictions across augmentations.
  • MixMatch / FixMatch (Sohn et al., 2020) — combine the two with strong augmentation.

In the foundation-model era, semi-supervised learning is largely subsumed by self-supervised pretraining + supervised fine-tuning — the pretrained model already encodes most of what semi-supervision used to provide.

Offline / batch learning vs online

Orthogonal axis: does the learner see all data up front, or stream it?

  • Offline — full dataset available. The default in this curriculum.
  • Online — data arrives one example at a time; model must adapt incrementally. Streaming, time-series, recommender systems.
  • Active learning — learner chooses which examples get labelled, querying an oracle. Useful when labels are expensive (medical imaging, expert annotation).

Offline RL is a special case where the agent cannot interact further and must learn purely from a logged dataset.

Transfer, multi-task, and meta-learning

Three related ideas about reusing learning across problems:

  • Transfer learning — pretrain on one task, fine-tune on another. The standard recipe in NLP and vision.
  • Multi-task learning — train one model on many tasks simultaneously, sharing parameters. T5 made this the default for NLP.
  • Meta-learning — learn how to learn from a distribution of tasks. MAML (Finn et al., 2017) is the reference; modern in-context learning in LLMs is meta-learning emerged from scale.

Choosing a paradigm

The task usually picks for you:

  • Have abundant labels? Supervised.
  • Want to discover structure? Unsupervised.
  • Sequential decisions with reward? Reinforcement.
  • Cheap data + expensive labels? Self-supervised pretraining + supervised fine-tuning, in that order.
  • ERM — the formal core of supervised learning.
  • PAC Learning — the theoretical framework for sample complexity.
  • RL Overview — the reinforcement track.

Released under the MIT License. Content imported and adapted from NoteNextra.