Chunking

Splitting documents so retrieval actually surfaces the right passage.

RAG9 min · interactive

Retrieval-augmented generation (RAG) lets a model use documents it was never trained on. The first step is the least glamorous and the most consequential: cutting those documents into chunks small enough to embed and retrieve, but large enough to still make sense. Drag the sliders and watch the trade-off.

Chunk size 30 wordsOverlap 6 words

Retrieval-augmented generation gives a language model access to knowledge it was never trained on. Documents are split into chunks, each chunk is embedded into a vector, and the vectors are stored in an index. At query time the question is embedded too, the nearest chunks are retrieved, and they are pasted into the prompt as context. Chunk size is a balance: chunks that are too large dilute the relevant sentence with noise and waste context, while chunks that are too small lose the surrounding meaning a passage needs to make sense. Overlap copies a few words across the boundary so an idea split between two chunks still survives in at least one of them.

Hover a chunk to highlight its words. Notice the striped words — those sit in the overlap, so they appear in two chunks. Crank size down and you get many tiny, context-poor chunks; crank it up and each chunk blurs many ideas together. Overlap is cheap insurance against splitting a thought across a boundary.

Why size is a balance

Chunks that are too big bury the one relevant sentence in noise and burn context tokens; chunks that are too small get retrieved without the surrounding context needed to interpret them. There's no universal number — it depends on your documents and your embedding model — which is exactly why being able to see it helps.

Why overlap

A fixed window will sometimes slice a single idea in half. Overlap copies a few tokens across each boundary so the idea survives intact in at least one chunk — cheap insurance against unlucky cuts.

← Back to all modules