Skip to main content

Context Rot

Context rot is the degradation of model performance as a context window fills up with more content. Attention spreads thin across a long prompt, retrieval from mid-context positions drops relative to the beginning and end (the "lost in the middle" pattern), and small low-value chunks crowd out high-value ones. The effect is well-documented across long-context evaluations on every major model family, though its severity varies by model and task. Context rot is the main reason "just stuff the whole document into the prompt" is not always a good replacement for retrieval, and it is why aggressive context engineering — compression, reranking, chunk budgeting — usually beats raw context length on quality metrics. The practical heuristic: more context helps up to a point, after which each additional token trades against the signal already in the prompt.

Example

A research assistant gets a flat 200-document dump pasted into a long-context model. On a needle-in-haystack eval, retrieval of a fact placed near position 100 of 200 documents drops from near-perfect (at the beginning or end of the prompt) to well below half. The team switches to a RAG pipeline that retrieves the top ten relevant documents per query and drops the other 190. Accuracy on the same eval rises materially, and per-query cost falls by roughly 20x. The fix was not a better model — it was giving the model less, but better-targeted, context.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts