Reranking

Reranking

Reranking is a secondary scoring pass over an initial set of retrieval candidates to improve their ordering before they are handed to the generator.

A typical flow is: vector search returns the top 50 candidates by embedding similarity, then a reranker — usually a cross-encoder model that reads the query and document together — re-scores each pair more carefully, and only the new top 5 or 10 are used as context. Cross-encoders are more accurate than bi-encoder embedding similarity because they attend jointly over the query and document rather than scoring each independently in a shared space, at the cost of being too slow to run over the whole corpus. Reranking adds inference latency and cost but usually yields a material quality jump, especially on queries where the right document is in the candidate set but not at the top.

Example

A support-docs assistant retrieves 50 candidate passages per question with an embedding model, then passes them through a small cross-encoder reranker. Before reranking, the correct passage is in the top 5 on 62% of eval questions; after reranking, it is in the top 5 on 88%. End-to-end answer accuracy improves by 14 points, at the cost of roughly 200ms of added per-query latency.

Frequently asked questions

What is Reranking?: Reranking is a secondary scoring pass over an initial set of retrieval candidates to improve their ordering before they are handed to the generator.
How does Reranking work?: A typical flow is: vector search returns the top 50 candidates by embedding similarity, then a reranker — usually a cross-encoder model that reads the query and document together — re-scores each pair more carefully, and only the new top 5 or 10 are used as context.
Can you give an example of Reranking?: A support-docs assistant retrieves 50 candidate passages per question with an embedding model, then passes them through a small cross-encoder reranker. Before reranking, the correct passage is in the top 5 on 62% of eval questions; after reranking, it is in the top 5 on 88%. End-to-end answer accuracy improves by 14 points, at the cost of roughly 200ms of added per-query latency.

Related Resources

Blog Post

Reranking Retrieval Results: A Cross-Encoder Walkthrough

Bi-encoder similarity hits a ceiling around the top of the result list. This walkthrough shows how to add a cross-encoder reranker to a RAG pipeline, what the latency budget looks like, and which reranker families make sense in 2026.

Example

Frequently asked questions

What is Reranking?

How does Reranking work?

Can you give an example of Reranking?

Related Terms

Related Resources

Reranking Retrieval Results: A Cross-Encoder Walkthrough