Contextual Retrieval
Contextual Retrieval is a technique introduced by Anthropic in 2024 that prepends a short chunk-specific context summary to each chunk before it is embedded and indexed for BM25. The summary — generated once per chunk at index time by a lightweight LLM call — describes what the chunk is about in the frame of its parent document, so a chunk like "it was launched in Q3" becomes "This chunk from the 2024 Acme Product Report discusses the Widget Pro; it was launched in Q3". Both the embedding vector and the BM25 tokens then carry document-level context, which is otherwise lost when chunking splits a document into short segments. The reported effect is a 30-50 percent reduction in top-k retrieval error over plain chunk embedding, at the cost of one extra LLM call per chunk during indexing and no extra cost at query time.
Example
A legal-research team has 180,000 chunks indexed from case filings. Retrieval on ambiguous references ("the defendant's motion to dismiss was granted") fails often because the chunk text alone does not identify which case or defendant is meant. They regenerate the index with contextual retrieval, adding a sentence like "From Smith v. Acme, 2023, discussing the defendant Acme's procedural motion" to each chunk before embedding. Indexing cost rises by one small LLM call per chunk, a one-time expense. Top-20 retrieval error on case-specific queries drops by roughly 40% with no change to the retriever or generator.
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts