Multi-Query Retrieval
Multi-query retrieval is a RAG pattern that hedges against the single-query-phrasing failure mode of standard retrieval. Before hitting the index, the pipeline uses an LLM to generate several paraphrased or reframed versions of the user query — a literal rephrase, a broader abstraction, a narrower specialization, a keyword-only variant. Each version is run through the retriever independently, and the resulting candidate lists are unioned or merged with reciprocal-rank-fusion. The pattern improves recall when the single user phrasing happens to use vocabulary that is far from the corpus — jargon mismatch, indirect wording, questions phrased from a user's mental model rather than the authors'. The cost is one additional LLM call to generate queries and several retrievals per user request, which matters for latency-sensitive surfaces but is inexpensive for knowledge work where retrieval quality dominates.
Example
A customer-support search gets the query "why is my thing broken?" — natural language but poorly aligned with docs that use product codes and formal feature names. A standard retriever returns generic troubleshooting pages. The multi-query step generates four variants: "device not working", "error code troubleshooting", "device status LED red diagnosis", and "support reset procedure". Unioned retrieval now pulls the right model-specific article via the third variant. Top-3 recall on conversational queries rises from illustrative 0.61 to 0.84, at the cost of one extra 200ms LLM call per request.
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts