Question 1

What is Contextual Compression?

Accepted Answer

Contextual compression is a preprocessing step that sits between retrieval and generation in a RAG pipeline.

Question 2

How does Contextual Compression work?

Accepted Answer

Instead of passing the full retrieved chunks directly to the generator, a compression step filters or summarizes them against the specific query — dropping passages that turn out to be off-topic, extracting only the sentences that actually answer the question, or rewriting long passages into a tighter summary.

Question 3

Can you give an example of Contextual Compression?

Accepted Answer

A legal-research assistant retrieves ten 1,200-token case passages per query. Before contextual compression, the generator receives 12,000 tokens of context per call and often ignores relevant text buried deep in the prompt. A small compression step filters each passage down to the sentences semantically closest to the query, yielding roughly 3,000 tokens of context per call. Generator cost drops by 70%, p95 latency drops by 40%, and answer faithfulness on the eval set rises from illustrative 0.77 to 0.84 because the generator is no longer distracted by irrelevant surrounding text.

Contextual Compression

Example

Frequently asked questions

What is Contextual Compression?

How does Contextual Compression work?

Can you give an example of Contextual Compression?

Related Terms

Put this into practice