Tokenization

BPE Tokenizer

Before text reaches the transformer, it gets split into subword tokens via Byte-Pair Encoding. Each token maps to an integer ID and then to a dense embedding vector.

Try it

Result

4 tokens

The1000·quick2001·brown2002·fox2003

Pos	Token	ID
0	"The"	1000
1	"·quick"	2001
2	"·brown"	2002
3	"·fox"	2003

How BPE works

1. Start with individual bytes as the vocabulary.

2. Find the most frequent adjacent pair in the training data.

3. Merge that pair into a new token. Add it to the vocab.

4. Repeat until you hit the target vocab size (32K–128K tokens).

Why this matters for vLLM

KV cache size: every token position stores K and V vectors. More tokens = more GPU memory.

Block allocation: vLLM allocates KV cache in fixed-size blocks. The token count determines how many blocks a request needs.

Prefix sharing: shared prefixes are detected at the token level, so consistent tokenization is critical for cache reuse.