HyDE (Hypothetical Document Embeddings)
HyDE is a retrieval technique in which the language model first generates a hypothetical answer to the user's query, and then that hypothetical answer — not the original query — is embedded and used to retrieve real documents by vector similarity. The idea, introduced by Gao et al. in 2022, is that in embedding space a plausible-looking answer is often closer to the real supporting documents than a short, under-specified question. HyDE helps most when queries are terse and the documents that should match are long-form prose, where the vocabulary and sentence shape of the query and the document differ substantially. It is a prompt-time trick rather than a training change, so it composes cleanly with other retrieval upgrades.
Example
A research assistant is asked "effects of vitamin D on muscle recovery?" — a short noun phrase. Directly embedding the query returns mixed results. With HyDE, the model first drafts a paragraph-length hypothetical answer about vitamin D, muscle-protein synthesis, and recovery timelines; that paragraph is then embedded. Vector search against the hypothetical retrieves seven more on-topic studies in the top ten than the direct-query baseline did.
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts