Skip to content

LLaMA 1 & 2

LLaMA: Open and Efficient Foundation Language Models (Touvron et al., Meta, Feb 2023) released a family of decoder-only Transformers trained Chinchilla-style — 7B, 13B, 33B, 65B parameters on ~1T tokens of public data. The models were initially gated for academic research; the weights leaked widely within a week. Nine months later, Meta open-sourced LLaMA 2 with a permissive licence. LLaMA is the model family that re-opened frontier-quality LLMs to the open-source community after GPT-4 closed the door.

LLaMA 1 (Feb 2023)

LLaMA 1 was a research model, not a product. The recipe:

  • Architecture — standard decoder-only Transformer with three modern tweaks: pre-norm using RMSNorm, SwiGLU activations, Rotary Position Embeddings (RoPE).
  • Sizes — 7B, 13B, 33B, 65B parameters.
  • Data — 1T tokens (LLaMA-7/13B) and 1.4T tokens (LLaMA-33/65B) from CommonCrawl, C4, GitHub, Wikipedia, books, ArXiv, Stack Exchange. All public sources.
  • Compute — 65B trained for ~21 days on 2048 A100s.

The headline result: LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with PaLM-540B and Chinchilla-70B. Following the Chinchilla recipe at 1T+ tokens — substantially more than GPT-3's training — produced much stronger small models.

The leak

LLaMA was distributed via a request form to academic researchers. Within a week, the weights were posted to BitTorrent and 4chan. Meta did not acknowledge this publicly but did not actively combat it either. The leak was the practical end of "research-only" weight distribution at frontier scale — once 7B-level weights are in the wild, they are in the wild.

The Hugging Face community ran with LLaMA-7B/13B almost immediately. Within weeks: Alpaca (Stanford, instruction-tuned LLaMA), Vicuna (LMSYS), and dozens of derivative fine-tunes. The open-source LLM ecosystem of 2023–24 was built on LLaMA's leaked weights.

LLaMA 2 (July 2023)

Meta open-sourced LLaMA 2 in collaboration with Microsoft. Key features:

  • Sizes — 7B, 13B, 70B (the 33B was withheld initially over concerns about safety capability).
  • 2T training tokens — 40% more than LLaMA 1.
  • 4K context at base, with extended versions later.
  • Pretrained + chat versions — Meta released both the base model and an RLHF-tuned chat variant.
  • Permissive licence — commercial use allowed for products with under 700M MAU. Effectively open for the vast majority of users.

LLaMA-2-70B-chat was the first widely-available open-weights model competitive with closed models on user-facing chat tasks. The release was a significant open-source moment.

LLaMA's architectural choices

The LLaMA paper crystallised the modern decoder-only architecture that everyone now copies:

  • Pre-norm (Norm-Attention-Residual) instead of post-norm — gradient stability at depth.
  • RMSNorm instead of LayerNorm — slightly cheaper, no quality loss.
  • SwiGLU activation in feed-forward — small but consistent gain over ReLU/GELU.
  • RoPE position encoding — relative-position information via rotations in Q,K subspaces.
  • No bias in linear layers — small parameter savings.

Almost every open LLM since (Mistral, Qwen, Yi, DeepSeek, Phi) uses this exact architectural template. The "LLaMA architecture" is the de-facto standard.

Open-source explosion

The 12 months after LLaMA 1's release saw an explosion of derivative work:

  • Instruction-tuned LLaMAs — Alpaca, Vicuna, WizardLM, Orca.
  • LoRA / QLoRA fine-tuning — efficient methods that work especially well on LLaMA's architecture.
  • Quantised inference — llama.cpp, GGUF, exllama. Made LLaMA runnable on consumer CPUs and laptops.
  • Multilingual variants — Aya, Belle, Sabiá.
  • Code models — Code Llama (Meta, 2023), built on LLaMA 2.

By late 2023, the open-source ecosystem had become a serious counterweight to the closed frontier. LLaMA's role was foundational.

What followed

  • LLaMA 3 (April 2024) — see LLaMA 3. 8B / 70B / 405B sizes, 15T training tokens.
  • LLaMA 3.1, 3.2, 3.3 — incremental updates through 2024.
  • LLaMA 4 (2025) — the next-generation Meta open release.

Meta has remained the dominant open-frontier-model producer. The LLaMA line is the spine of the open-source LLM ecosystem.

  • LLaMA 3 — the modern frontier-quality successor.
  • Mistral — the European open-models contender.
  • Open Models — the broader open-LLM ecosystem.

Released under the MIT License. Content imported and adapted from NoteNextra.