Skip to content

Computer Vision

A topic-organized track for vision. Foundations first, then deep architectures, then a paper-by-paper survey of modern research areas.

Reading path

  1. Foundations — image formation, filters, features, geometry. The math you need before anything is "deep".
  2. Deep Vision Architectures — CNN backbones, detection, segmentation, ViT.
  3. Modern Topics — ten research areas, each surveyed as a paper list:
    • Semantic Segmentation
    • Vision-Language Models
    • Neural Rendering (NeRF, 3D Gaussian Splatting)
    • Image and Video Generation
    • Geometric Deep Learning for CV
    • Representation Learning (SimCLR → DINOv2)
    • Correspondence & Structure-from-Motion
    • Safety, Robustness, Evaluation
    • Embodied CV & Robotics
    • Open-Vocabulary Detection

TIP

Vision-Language Models overlaps the LLM track — see also LLM · Multi-Modal LLMs.

Released under the MIT License. Content imported and adapted from NoteNextra.