Model Deep Dive

Llama 3.1: 8B Parameters

A concrete walkthrough of Meta's Llama 3.1-8B architecture: every dimension, every weight matrix, and exactly how much memory each piece consumes when served with vLLM.

8.03B

Parameters

Layers

128K

Context

4:1

GQA Ratio

16.1 GB

Model Size (BF16)

Model Configuration

Hidden Dim

4,096

FFN Dim

14,336

Query Heads

KV Heads

Head Dim

128

Vocab Size

128,256

RoPE Base θ

500,000

Activation

SiLU (SwiGLU)

Layer-by-Layer Stack (click to expand)

Input Token IDs

× 32 transformer blocks

lm_head → Softmax → Next Token

Grouped Query Attention: 4:1 Sharing

Each KV head serves 4 query heads. This diagram shows one attention block with all 32 Q heads grouped under 8 KV heads.