Question 1

What is Bi-Encoder?

Accepted Answer

A bi-encoder is a dual-tower transformer architecture in which the query and the document are encoded independently by the same (or twin) encoder into separate fixed-size vectors, and relevance is computed as cosine or dot-product similarity between those vectors.

Question 2

How does Bi-Encoder work?

Accepted Answer

Because the two sides never attend to each other, documents can be embedded once at index time and stored in a vector database; at query time only the query needs encoding, and retrieval reduces to nearest-neighbor search — cheap enough to run over millions of documents in milliseconds.

Question 3

Can you give an example of Bi-Encoder?

Accepted Answer

A product-search team uses a bi-encoder to embed 4M product descriptions offline. At query time, they embed the incoming query in roughly 5ms, run approximate nearest-neighbor search in roughly 15ms, and return the top 100 hits — total first-stage latency under 25ms. A cross-encoder reranker then reorders the top 50 in another 100ms. The bi-encoder does the cheap, broad sweep over millions of documents; the cross-encoder does the expensive, accurate reordering over 50. Neither architecture alone would be acceptable: bi-encoder alone loses accuracy, cross-encoder alone is hundreds of seconds per query.

Bi-Encoder

Example

Frequently asked questions

What is Bi-Encoder?

How does Bi-Encoder work?

Can you give an example of Bi-Encoder?

Related Terms

Related Resources

Reranking Retrieval Results: A Cross-Encoder Walkthrough