BM25 vs. Vector Search: Why Your RAG App Needs Both

Akram Chauhan
Akram Chauhan
11 min read166 views
BM25 vs. Vector Search: Why Your RAG App Needs Both

Have you ever typed something into a search bar and gotten results that were… technically correct, but completely missed the point? You search for "how to find similar articles without exact keywords," and the engine just stares back, blankly, because you didn't use the exact right words.

It’s a classic problem. On the flip side, sometimes you search for something vague, and the AI just gets it, pulling up a document that uses totally different words but perfectly captures your intent.

What you're seeing is the difference between two fundamentally different ways of thinking about search: matching keywords versus matching meaning.

For decades, the undisputed king of keyword matching has been an algorithm called BM25. It’s the powerhouse behind huge search engines like Elasticsearch, and it’s incredibly good at what it does. But it has a blind spot—it doesn’t understand what words mean.

That’s where the new kid on the block, Retrieval-Augmented Generation (RAG) powered by vector search, comes in. It’s all about understanding the vibe, the context, the semantic meaning.

So, how do they actually work? Let's pop the hood and take a look. We'll even run them head-to-head to see where each one wins and why, in the real world, you almost always want both.

Meet BM25: The Clever Keyword Counter

At its heart, BM25 is a really, really smart keyword-matching system. Imagine you have a giant library of documents. When you search for something, BM25 runs around and gives every single document a score based on your query. The highest-scoring documents get shown to you first.

To come up with that score, it asks three simple questions for each word in your search query:

  1. How often does this word show up in the document? (This is Term Frequency).
  2. How rare is this word across all the documents? (This is Inverse Document Frequency).
  3. Is this document super long compared to the others? (This is Length Normalization).

The magic is in how it balances these things.

For term frequency, BM25 knows that a word appearing 10 times isn't 10 times more important than a word appearing once. The relevance score goes up quickly at first, but then it levels off. This is a brilliant little feature called "term frequency saturation," and it’s what stops people from gaming the system by just stuffing a keyword into a page a thousand times.

Then there's the length penalty. A 50-page document has a much higher chance of containing your search term than a one-page document, just by sheer volume. BM25 adjusts for this, so shorter, more focused documents can still rise to the top.

Finally, and maybe most importantly, is Inverse Document Frequency (IDF). This is what gives rare words their power. If you search for "retrieval augmented generation," the word "retrieval" is probably a much stronger signal than a common word like "the." IDF makes sure the unique, descriptive words carry more weight.

But here’s the catch, and it's a big one: BM25 sees text as just a "bag of words." It has zero understanding of context, synonyms, or intent. To BM25, a document about a "river bank" and a "financial bank" look identical if you just search for "bank." It's a keyword matcher, not a mind reader.

Enter Vector Search: It’s All About the Vibe

This is where things get really interesting. Vector search takes a completely different approach. Instead of counting words, it tries to capture the meaning of the words.

Here’s how it works:

  1. You take a powerful AI model (an "embedding model").
  2. You feed it your query, and it spits out a long list of numbers, called a "vector." This vector is like a numerical fingerprint for the meaning of your query.
  3. You do the same thing for every single document in your library, creating a vector for each one.
  4. To find the best matches, you just find the document vectors that are "closest" to your query vector.

Think of it like a giant map. The embedding model places every document on this map. Documents about similar topics, like "heart attack" and "cardiac arrest," get placed right next to each other, even if they don't share any of the same keywords. Documents about unrelated topics, like "PostgreSQL databases" and "Python programming," are placed far apart.

When you type in a query, the model places your query on the map, too. The search then becomes incredibly simple: just find the closest documents. This "closeness" is usually measured by something called "cosine similarity"—which is just a fancy way of asking, "how much are these two vectors pointing in the same direction?"

The tradeoff? BM25 is lightweight, fast, and easy to understand. It’s just math on word counts. Vector search requires a heavy-duty AI model, which means API calls, potential costs, and a bit more latency. It's also harder to explain exactly why it decided two things were similar.

Neither one is strictly better. They just fail in completely different ways, which is exactly why modern, production-grade systems are increasingly using a "hybrid" approach that combines the strengths of both.

Let's See Them in Action: A Head-to-Head Showdown

Talk is cheap, so let's actually build these two retrievers and pit them against each other. We'll use a small "corpus" of 12 text chunks on various tech topics. This will be our mini knowledge base.

First, let's get our tools ready. We'll need a couple of Python libraries.

pip install rank_bm25 openai numpy

And we'll need to set up our connection to OpenAI to generate the embeddings.

import re
import numpy as np
from rank_bm25 import BM25Okapi
from openai import OpenAI
import os
from getpass import getpass

# Make sure to set your API key
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
client = OpenAI()

Our Mini Knowledge Base

Here are the 12 documents we'll be searching through. I've deliberately mixed related and unrelated topics to see how our retrievers handle them.

CHUNKS = [
    # 0 "Python is a high-level, interpreted programming language...",
    # 1 "Machine learning is a subset of artificial intelligence...",
    # 2 "BM25 stands for Best Match 25. It is a bag-of-words retrieval function...",
    # 3 "Transformer architecture introduced the self-attention mechanism...",
    # 4 "Vector embeddings represent text as dense numerical vectors...",
    # 5 "TF-IDF stands for Term Frequency-Inverse Document Frequency...",
    # 6 "Retrieval-Augmented Generation (RAG) combines a retrieval system...",
    # 7 "Django is a high-level Python web framework...",
    # 8 "Cosine similarity measures the angle between two vectors...",
    # 9 "Gradient descent is an optimization algorithm...",
    # 10 "PostgreSQL is an open-source relational database...",
    # 11 "Sparse retrieval methods like BM25 rely on exact keyword matches...",
]

Building the BM25 Retriever

Setting up BM25 is straightforward. We just need to "tokenize" our text—basically, chop each sentence into a list of lowercase words. Then we feed that into the BM25Okapi library, which does all the heavy lifting of calculating IDF scores and document lengths for us.

def tokenize(text: str) -> list[str]:
    """Lowercase and split on non-alphanumeric characters."""
    return re.findall(r'\w+', text.lower())

# Build the index from our tokenized chunks
tokenized_corpus = [tokenize(chunk) for chunk in CHUNKS]
bm25 = BM25Okapi(tokenized_corpus)

def bm25_search(query: str, top_k: int = 3) -> list[dict]:
    """Return top-k chunks ranked by BM25 score."""
    tokens = tokenize(query)
    scores = bm25.get_scores(tokens)
    ranked_indices = np.argsort(scores)[::-1][:top_k]
    return [
        {"chunk_id": int(i), "score": round(float(scores[i]), 4), "text": CHUNKS[i]}
        for i in ranked_indices
    ]

# Let's give it a quick test run
results = bm25_search("how does BM25 rank documents", top_k=3)
print("BM25 test -- query: 'how does BM25 rank documents'")
for r in results:
    print(f" [{r['chunk_id']}] score={r['score']} {r['text'][:70]}...")

Building the Embedding Retriever

The process for vector search is totally different. Instead of tokenizing, we're going to make an API call to OpenAI for every single chunk to convert it into an embedding. We'll store these embeddings in memory.

EMBED_MODEL = "text-embedding-3-small"

def get_embedding(text: str) -> np.ndarray:
    response = client.embeddings.create(model=EMBED_MODEL, input=text)
    return np.array(response.data[0].embedding)

def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

# This is our one-time "indexing" step. It can be slow and costly for large datasets!
print("Building embedding index...")
chunk_embeddings = [get_embedding(chunk) for chunk in CHUNKS]
print("Done.")

def embedding_search(query: str, top_k: int = 3) -> list[dict]:
    """Return top-k chunks ranked by cosine similarity."""
    query_emb = get_embedding(query)
    scores = [cosine_similarity(query_emb, emb) for emb in chunk_embeddings]
    ranked_indices = np.argsort(scores)[::-1][:top_k]
    return [
        {"chunk_id": int(i), "score": round(float(scores[i]), 4), "text": CHUNKS[i]}
        for i in ranked_indices
    ]

# And a quick test for this one too
results = embedding_search("how does BM25 rank documents", top_k=3)
print("\nEmbedding test -- query: 'how does BM25 rank documents'")
for r in results:
    print(f" [{r['chunk_id']}] score={r['score']} {r['text'][:70]}...")

The Results Are In: Side-by-Side Comparison

Now for the fun part. Let's run a few queries through both systems and see what happens.

Query 1: A Keyword-Heavy Search

Let's start with a query that's basically a list of keywords: "BM25 term frequency inverse document frequency"

======================================================================
 QUERY: "BM25 term frequency inverse document frequency"
======================================================================

 BM25 (keyword)                      Embedding RAG (semantic)
 ─────────────────────────────────   ─────────────────────────────────
 #1 [02] 3.9961 BM25 stands for Bes...
     [02] 0.8105 BM25 stands for Bes... same

 #2 [05] 3.4542 TF-IDF stands for T...
     [05] 0.7675 TF-IDF stands for T... same

 #3 [11] 2.1100 Sparse retrieval me...
     [11] 0.6186 Sparse retrieval me... same

No surprise here. This query is a softball for BM25. It sees the exact keywords "BM25," "term," "frequency," and "document" and immediately surfaces the most relevant chunks. The embedding search agrees completely, because those chunks are also semantically the closest. Winner: A tie, but BM25 was faster and cheaper.

Query 2: A Conceptual Search

Now let's ask a question using natural language, without a ton of specific keywords: "what is RAG and why does it reduce hallucinations"

======================================================================
 QUERY: "what is RAG and why does it reduce hallucinations"
======================================================================

 BM25 (keyword)                      Embedding RAG (semantic)
 ─────────────────────────────────   ─────────────────────────────────
 #1 [06] 1.0799 Retrieval-Augmented...
     [06] 0.7937 Retrieval-Augmented... same

 #2 [01] 0.0000 Machine learning is...
     [04] 0.5471 Vector embeddings r...

 #3 [02] 0.0000 BM25 stands for Bes...
     [11] 0.5367 Sparse retrieval me...

This is where things get interesting! Both retrievers correctly identify chunk #6 as the top result because it contains the keyword "RAG." But look at the rest. BM25 completely falls apart. It couldn't find any other documents with the keywords, so it returned scores of zero.

The embedding search, however, understood the concept. It knows that RAG is related to "vector embeddings" (chunk #4) and that it's a "dense retrieval" method that contrasts with "sparse retrieval" (chunk #11). It found conceptually related documents even without any keyword overlap. Winner: Vector Search, by a landslide.

Query 3: A Synonym-Based Search

Let's try one more, searching for a concept using words that don't appear in the best document: "measuring the angle between vectors"

======================================================================
 QUERY: "measuring the angle between vectors"
======================================================================

 BM25 (keyword)                      Embedding RAG (semantic)
 ─────────────────────────────────   ─────────────────────────────────
 #1 [08] 0.9373 Cosine similarity m...
     [08] 0.8654 Cosine similarity m... same

 #2 [04] 0.9038 Vector embeddings r...
     [04] 0.7289 Vector embeddings r... same

 #3 [09] 0.0000 Gradient descent is...
     [01] 0.4905 Machine learning is...

Again, a fascinating result. BM25 actually does okay here, because the words "vectors" and "angle" appear in the top result (chunk #8). But vector search nailed it. It knows that "measuring the angle between vectors" is the definition of "cosine similarity." It also knows this is a core concept in "vector embeddings." Its third choice, "machine learning," is also a more relevant connection than BM25's zero-score result. Winner: Vector Search, for its deeper understanding.

So, Who Wins? The Real Answer is… Both.

As you can see, it's not about one being "better" than the other. It's about using the right tool for the right job.

  • BM25 is your go-to for precision. When users search for specific product codes, error messages, or names, you want the exact match that keyword search provides. It’s fast, reliable, and perfectly literal.
  • Vector Search is your go-to for discovery. When users are exploring a topic, asking conceptual questions, or don't know the exact terminology, semantic search is what bridges the gap and delivers those magical, "how did it know that?" results.

This is why the state-of-the-art approach is hybrid search. You run the query through both BM25 and a vector search retriever, then use a smart ranking algorithm to combine the results. This gives you the best of both worlds: the pinpoint accuracy of keyword matching and the contextual understanding of semantic search.

So the next time you're building a RAG pipeline or just wondering how your search engine works, remember the two players working behind the scenes. One is a meticulous librarian, checking every index card for your exact words. The other is a well-read expert who understands the concepts and can recommend the right book, even if you can't remember the title. And together, they're a pretty unstoppable team.

Tags

AI Machine Learning Deep Learning Generative AI AI in Search Content Discovery Vector Databases AI System Design AI Tools & Applications Natural Language Processing (NLP) Retrieval Augmented Generation (RAG) Large Language Models (LLMs) Semantic Search Information Retrieval BM25 Keyword Matching Search Engines Search Algorithms Elasticsearch Search Relevance

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.