Skip to main content

Lost in the Middle

Lost in the middle is the finding from Liu et al. (2023) that language models' ability to recall information degrades sharply for content placed in the middle of a long context, even when the total context length is well under the nominal window limit. Across the models they tested, performance was U-shaped: strongest for information at the beginning of the prompt, second-strongest at the end, weakest buried in the middle. The effect is present in both open-source and frontier models and is not fully explained by position embeddings — training-data distribution of prompt structures is a contributing factor. The practical implication is concrete: the important information in a prompt should not be buried in the middle of a pile of retrieved passages, but placed near the beginning or end, and layout matters alongside length when designing long prompts.

Example

A RAG pipeline returns ten retrieved passages concatenated in retrieval order and appends the user question at the end. Evaluations show that when the ground-truth passage happens to fall at position 5 or 6, answer accuracy drops well below what it is when the same passage is at position 1 or 10. The team reorders the prompt: highest-ranked passage first, second-highest last, and the middle positions filled with lower-ranked ones — so the two strongest signals occupy the two strongest positions. End-to-end answer accuracy rises from illustrative 0.72 to 0.81 without changing the retriever.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts