Model Deep Dive
Llama 3.1: 8B Parameters
A concrete walkthrough of Meta's Llama 3.1-8B architecture: every dimension, every weight matrix, and exactly how much memory each piece consumes when served with vLLM.
8.03B
Parameters
32
Layers
128K
Context
4:1
GQA Ratio
16.1 GB
Model Size (BF16)
Model Configuration
Hidden Dim
4,096
FFN Dim
14,336
Query Heads
32
KV Heads
8
Head Dim
128
Vocab Size
128,256
RoPE Base θ
500,000
Activation
SiLU (SwiGLU)
Layer-by-Layer Stack (click to expand)
Input Token IDs
× 32 transformer blocks
lm_head → Softmax → Next Token
Grouped Query Attention: 4:1 Sharing
Each KV head serves 4 query heads. This diagram shows one attention block with all 32 Q heads grouped under 8 KV heads.