Question 1

What is BM25?

Accepted Answer

BM25 is the dominant sparse-retrieval algorithm and the default scoring function in Elasticsearch, Lucene, OpenSearch, and most Postgres full-text setups.

Question 2

How does BM25 work?

Accepted Answer

BM25 is excellent at rare words, product codes, identifiers, exact phrases, and any term the embedding model has not seen enough of during pretraining, where it routinely beats dense retrieval.

Question 3

Can you give an example of BM25?

Accepted Answer

An e-commerce search team evaluates a pure-vector retriever and sees recall on category queries go up, but recall on SKU-style queries like "SKU-48A-V2" collapses because the embedding model never learned those tokens well. They switch to BM25 + vector hybrid search with reciprocal-rank-fusion. Top-5 recall on descriptive queries holds at the embedding-model baseline, recall on SKU-style and brand-specific queries recovers to essentially 1.0, and overall search success rises from illustrative 0.82 to 0.94 without any change to the embedding model.

BM25

Example

Frequently asked questions

What is BM25?

How does BM25 work?

Can you give an example of BM25?

Related Terms

Put this into practice