Skip to main content

Chunking

Chunking is the process of splitting source documents into smaller pieces before they are embedded and indexed for retrieval. The chunk size decision is a direct trade-off: small chunks give precise, targeted retrieval but can strip away surrounding context the model needs to answer well; large chunks carry more context but dilute the relevance signal because a single chunk now covers many topics. Common strategies are fixed-size chunking (e.g. 500 tokens with 50-token overlap), semantic chunking that splits on natural section or paragraph boundaries, and recursive chunking that breaks large pieces into smaller ones hierarchically so retrieval can match at multiple granularities. In practice, chunking is often the single biggest quality lever in a RAG pipeline — more than embedding model choice or retrieval algorithm — because bad chunks cap the ceiling of everything downstream.

Example

A legal-research assistant indexes a corpus of 2,000 case PDFs. The first pass uses 1,500-token fixed chunks; retrieved results often contain the right case but bury the relevant paragraph inside unrelated facts, and the generator drifts. The team rebuilds the index with 400-token chunks split on paragraph boundaries and adds a parent-document lookup so the model sees both the precise chunk and its surrounding section. Answer faithfulness on the eval set jumps from 0.71 to 0.84 without changing the embedding model or the generator prompt.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts