Graph Convolutional Networks
A graph convolutional network applies a learnable linear operation to each node, mixing in information from its neighbours. The classical GCN of Kipf & Welling is the simplest reference architecture and the entry point to graph deep learning. The trick is identifying what "convolution" should mean on an irregular graph — and the answer comes from spectral graph theory.
Spectral motivation
For a graph
Its eigendecomposition
Computing
The Kipf-Welling GCN
Semi-Supervised Classification with Graph Convolutional Networks (Kipf, Welling, ICLR 2017) takes the Chebyshev approximation to first order (
where
- Self-loops (
) keep the node's own features in the mix. - Symmetric normalisation (
) prevents activation magnitude from depending on node degree. - Linear feature transformation (
) followed by activation (typically ReLU).
A 2-layer GCN with one hidden layer was the original benchmark for semi-supervised node classification on Cora, Citeseer, and Pubmed — beating contemporaneous label-propagation and deep-walk baselines by large margins.
What GCN computes, intuitively
Each layer is a single hop of neighbour averaging plus a linear transform. Stacking
Over-smoothing
Stacking many GCN layers produces a counter-intuitive failure: node representations converge to identical values across the graph (in the limit, they project onto the principal eigenvector of
- Skip connections (similar to ResNet) — JKNet (Xu et al., ICML 2018), GCNII (Chen et al., ICML 2020).
- Graph normalisation layers that explicitly counter representational collapse.
- Mixed local–global architectures (e.g., Graph Transformers).
The over-smoothing limitation is the reason most production GNNs are 2–3 layers deep.
What GCN is good for
- Node classification on citation, social, and recommendation graphs.
- Link prediction and graph classification with appropriate readout heads.
- Drug discovery — molecules as graphs; a workhorse architecture in cheminformatics.
- Recommender systems — bipartite user-item graphs, with GCN as the embedding model.
What to read next
- Message Passing & GraphSAGE — the more general framework GCN is a special case of.
- Graph Attention Networks — replace fixed adjacency weighting with learned attention.
- Convolution & Pooling — the regular-grid analogue.