Skip to main content

Prompt Compression

Prompt compression encompasses techniques for reducing the length of a prompt while preserving its essential meaning and effectiveness. Methods include summarizing lengthy context, removing redundant instructions, using shorter phrasing, applying token-efficient formatting, and using specialized compression models. Prompt compression helps fit more information within context window limits and reduces API costs.

Example

A RAG system retrieves 5 long documents (12,000 tokens total) as context for a question. A prompt compression step distills these into the 3 most relevant passages with key sentences highlighted, reducing the context to 3,000 tokens — fitting within the budget while retaining the critical information needed for an accurate answer.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts