Chunking
Chunking is the process of splitting source documents into smaller pieces before they are embedded and indexed for retrieval. The chunk size decision is a direct trade-off: small chunks give precise, targeted retrieval but can strip away surrounding context the model needs to answer well; large chunks carry more context but dilute the relevance signal because a single chunk now covers many topics. Common strategies are fixed-size chunking (e.g. 500 tokens with 50-token overlap), semantic chunking that splits on natural section or paragraph boundaries, and recursive chunking that breaks large pieces into smaller ones hierarchically so retrieval can match at multiple granularities. In practice, chunking is often the single biggest quality lever in a RAG pipeline — more than embedding model choice or retrieval algorithm — because bad chunks cap the ceiling of everything downstream.
Example
A legal-research assistant indexes a corpus of 2,000 case PDFs. The first pass uses 1,500-token fixed chunks; retrieved results often contain the right case but bury the relevant paragraph inside unrelated facts, and the generator drifts. The team rebuilds the index with 400-token chunks split on paragraph boundaries and adds a parent-document lookup so the model sees both the precise chunk and its surrounding section. Answer faithfulness on the eval set jumps from 0.71 to 0.84 without changing the embedding model or the generator prompt.
Related Resources
Chunking Strategies for RAG: Fixed, Semantic, Recursive, and Parent-Document
Chunking is the single biggest quality lever in most RAG pipelines. This tutorial walks through fixed-size, semantic, recursive, and parent-document chunking on a hypothetical legal-research assistant — with diagnoses, fixes, and failure modes.
Context Compression Techniques (2026)
Three families of context compression — summarization, semantic chunking, and token-level compression. Fidelity vs compression rate trade-offs and when each fits.
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts