Question 1

What is ColBERT (Late Interaction Retrieval)?

Accepted Answer

ColBERT is a retrieval architecture that sits between bi-encoders and cross-encoders.

Question 2

How does ColBERT (Late Interaction Retrieval) work?

Accepted Answer

At index time, every document is run through a BERT-style encoder and stored as a matrix of per-token embeddings rather than a single pooled vector. At query time, the query is also encoded into per-token embeddings, but no joint forward pass with each candidate document is needed. Relevance is the MaxSim score: for each query token, take its maximum cosine similarity against any token of the candidate document, then sum those maxima across query tokens. Candidate filtering uses an approximate-nearest-neighbor pass over individual token embeddings, followed by an exact MaxSim rerank — keeping latency close to bi-encoder retrieval while gaining most cross-encoder accuracy.

Question 3

Can you give an example of ColBERT (Late Interaction Retrieval)?

Accepted Answer

A biomedical-search team finds that single-vector bi-encoders miss queries where a rare technical term ("mTOR inhibitor") must align precisely with a specific passage rather than with the document's overall topic. They migrate to a ColBERTv2 index with residual compression. Index storage grows from 40GB to roughly 300GB for the same corpus. Recall@10 on rare-term queries improves from illustrative 0.68 to 0.83, closing most of the gap to a cross-encoder reranker at a fraction of the query-time cost, and cross-encoder reranking becomes optional rather than necessary.

ColBERT (Late Interaction Retrieval)

How it works

Example

Frequently asked questions

What is ColBERT (Late Interaction Retrieval)?

How does ColBERT (Late Interaction Retrieval) work?

Can you give an example of ColBERT (Late Interaction Retrieval)?

Not to be confused with

Related Terms

Put this into practice