Computer Vision

A topic-organized track for vision. Foundations first, then deep architectures, then a paper-by-paper survey of modern research areas.

Reading path

Foundations — image formation, filters, features, geometry. The math you need before anything is "deep".
Deep Vision Architectures — CNN backbones, detection, segmentation, ViT.
Modern Topics — ten research areas, each surveyed as a paper list:
- Semantic Segmentation
- Vision-Language Models
- Neural Rendering (NeRF, 3D Gaussian Splatting)
- Image and Video Generation
- Geometric Deep Learning for CV
- Representation Learning (SimCLR → DINOv2)
- Correspondence & Structure-from-Motion
- Safety, Robustness, Evaluation
- Embodied CV & Robotics
- Open-Vocabulary Detection

TIP

Vision-Language Models overlaps the LLM track — see also LLM · Multi-Modal LLMs.