Chunking

Chunking

Chunking is the process of splitting source documents into smaller pieces before they are embedded and indexed for retrieval.

The chunk size decision is a direct trade-off: small chunks give precise, targeted retrieval but can strip away surrounding context the model needs to answer well; large chunks carry more context but dilute the relevance signal because a single chunk now covers many topics. Common strategies are fixed-size chunking (e.g. 500 tokens with 50-token overlap), semantic chunking that splits on natural section or paragraph boundaries, and recursive chunking that breaks large pieces into smaller ones hierarchically so retrieval can match at multiple granularities. In practice, chunking is often the single biggest quality lever in a RAG pipeline — more than embedding model choice or retrieval algorithm — because bad chunks cap the ceiling of everything downstream.

Example

A legal-research assistant indexes a corpus of 2,000 case PDFs. The first pass uses 1,500-token fixed chunks; retrieved results often contain the right case but bury the relevant paragraph inside unrelated facts, and the generator drifts. The team rebuilds the index with 400-token chunks split on paragraph boundaries and adds a parent-document lookup so the model sees both the precise chunk and its surrounding section. Answer faithfulness on the eval set jumps from 0.71 to 0.84 without changing the embedding model or the generator prompt.

Frequently asked questions

What is Chunking?: Chunking is the process of splitting source documents into smaller pieces before they are embedded and indexed for retrieval.
How does Chunking work?: The chunk size decision is a direct trade-off: small chunks give precise, targeted retrieval but can strip away surrounding context the model needs to answer well; large chunks carry more context but dilute the relevance signal because a single chunk now covers many topics. g.
Can you give an example of Chunking?: A legal-research assistant indexes a corpus of 2,000 case PDFs. The first pass uses 1,500-token fixed chunks; retrieved results often contain the right case but bury the relevant paragraph inside unrelated facts, and the generator drifts. The team rebuilds the index with 400-token chunks split on paragraph boundaries and adds a parent-document lookup so the model sees both the precise chunk and its surrounding section. Answer faithfulness on the eval set jumps from 0.71 to 0.84 without changing the embedding model or the generator prompt.

Related Resources

Blog Post

Chunking Strategies for RAG: Fixed, Semantic, Recursive, and Parent-Document

Chunking is the single biggest quality lever in most RAG pipelines. This tutorial walks through fixed-size, semantic, recursive, and parent-document chunking on a hypothetical legal-research assistant — with diagnoses, fixes, and failure modes.

Blog Post

Context Compression Techniques (2026)

Three families of context compression — summarization, semantic chunking, and token-level compression. Fidelity vs compression rate trade-offs and when each fits.

Example

Frequently asked questions

What is Chunking?

How does Chunking work?

Can you give an example of Chunking?

Related Terms

Related Resources

Chunking Strategies for RAG: Fixed, Semantic, Recursive, and Parent-Document

Context Compression Techniques (2026)