GraphRAG: When Knowledge Graphs Beat Chunk-Based Retrieval

Q: What is GraphRAG?

GraphRAG is a retrieval-augmented-generation variant that builds a knowledge graph from the source corpus — extracting entities, relationships, and community clusters — and uses the graph structure as retrieval context alongside or in place of raw document chunks. Microsoft Research's 2024 work popularized the term and released a reference implementation as open source. The approach helps most on queries that need to synthesize relationships across many documents, where chunk-based retrieval returns fragments that look individually relevant but fail to assemble into a coherent answer.

Q: How is GraphRAG different from regular RAG?

Regular [RAG](/glossary/rag) splits documents into chunks, embeds them, and retrieves the top-k most similar chunks for a query. GraphRAG adds a structural layer: during indexing, an LLM extracts entities and relationships from each chunk and merges them into a graph. At query time, retrieval can traverse the graph — following relationships between entities — instead of only ranking chunks by vector similarity. For lookup-style queries the two perform similarly; for questions that span many documents, the graph gives the generator something chunk retrieval structurally cannot assemble.

Q: What is the difference between local and global search in GraphRAG?

Local search answers questions about specific entities by pulling their immediate neighborhood — the entity node, its direct relationships, and the source passages that produced them. It is the graph equivalent of a focused lookup. Global search answers corpus-level questions by querying pre-computed community summaries — clusters of densely connected entities that the indexing step identified and summarized. Local search is cheap and specific; global search is expensive but handles 'what are the main themes across the whole corpus' questions that no amount of chunk retrieval can answer well.

Q: Is GraphRAG worth the extra indexing cost?

Only if your query mix skews toward cross-document reasoning. Building a GraphRAG index runs an LLM over every chunk twice — once for entity and relationship extraction, once for community summarization — so indexing cost is materially higher than chunk-and-embed, and it grows with corpus size. If your users mostly ask 'what does our policy say about X' (single-document lookup), the graph adds cost without adding answers. If they ask 'how do these things connect across our whole knowledge base' (synthesis), chunk RAG will miss the answer no matter how you tune it, and the graph pays for itself.

Q: Do I need GraphRAG if I already have hybrid search and reranking?

Probably not, unless your evals show a specific cross-document failure mode. [Hybrid search](/glossary/hybrid-search) plus [reranking](/glossary/reranking) fixes most 'retriever returned the wrong chunks' problems, and both are cheaper than building a graph. The signal that says 'now GraphRAG' is different: questions where retrieval surfaces the right chunks individually but the answer still fails because no single chunk contains the connection. That is a structural failure chunks cannot solve regardless of ranking.

Q: How does GraphRAG relate to Context Engineering?

GraphRAG is one possible implementation of the retrieval layer in the [Context Engineering Maturity Model](/blog/context-engineering-maturity-model). The model treats retrieval quality as a measurable discipline; GraphRAG is a specific technique teams reach for when evaluation surfaces cross-document reasoning failures that chunk-based retrieval cannot close. The principle stays the same — measure, diagnose, choose the right retrieval substrate for the failure mode — and GraphRAG is the right substrate for a specific class of failure, not a general upgrade.

Imtiaz Rayhan

Key takeaways:

Chunk-based RAG has a structural ceiling on "how are X and Y connected across the whole corpus" questions — no tuning of embeddings, reranking, or chunking closes it.
GraphRAG addresses that ceiling by building a knowledge graph — entities, relationships, community clusters — and retrieving over the graph rather than (or alongside) chunks.
Microsoft Research popularized the term and released a reference implementation in 2024 as open source on GitHub.
The pipeline is four stages: entity extraction, relationship extraction, community detection, graph-aware retrieval with local and global search modes.
Indexing is materially more expensive than chunk-and-embed. This is the central tradeoff — the graph has to pay for itself with a query mix that actually needs it.
GraphRAG is not a universal upgrade over RAG. It is the right tool when your evaluation surfaces cross-document reasoning failures and the wrong tool when users mostly ask single-document lookup questions.

Why chunk-based RAG hits a ceiling on cross-document questions

A chunk-based RAG pipeline does roughly this: split documents into passages — how you make that split is its own discipline, covered in chunking strategies for RAG — run each through an embedding model, store the vectors in a vector database, and at query time retrieve the top-k most similar chunks by semantic search. The generator answers from whatever survives retrieval. This is the workhorse pattern and, for a large class of questions, the correct one. See the RAG prompt engineering guide and retrieval-augmented prompting patterns for the full treatment.

It has one structural weakness no tuning closes.

When the question asks how things relate across documents — when the answer is not contained in any single chunk but has to be synthesized by walking a network of mentions — chunk retrieval returns fragments that each look relevant on their own and fail as a group. The retriever surfaces five chunks about Company A and five about Company B. The generator produces a fluent answer that talks about both and misses the actual connection, because the connection lived in a sixth document that mentioned neither by the query's terms, or only appears when you aggregate signals across twenty documents that no top-k cutoff was going to include together.

This is not a ranking problem. Hybrid search does not fix it. Reranking does not fix it. Bigger context windows do not fix it — you can stuff 50 chunks into context and the model still struggles to construct a cross-document claim that no chunk makes explicitly. Chunks are nodes without edges, and some questions are fundamentally about edges.

What GraphRAG actually does

GraphRAG changes the retrieval substrate. Instead of retrieving chunks and handing them to the generator, the indexing step builds a knowledge graph from the corpus first, and retrieval happens over the graph.

The pipeline has four stages, run once at indexing and then incrementally as the corpus changes.

Stage 1: Entity extraction. An LLM reads each chunk and extracts the entities that appear in it — people, organizations, products, locations, events, domain-specific concepts — along with type labels and descriptions. This is plain extraction prompting at scale, with a fixed schema. Entities mentioned in multiple chunks are merged into single nodes.

Stage 2: Relationship extraction. The same LLM pass extracts relationships between co-occurring entities — "Company A partnered with Provider B in 2023", "Person X worked at Org Y from 2019 to 2022", "Product P is a component of Platform Q". Each relationship becomes a labeled, sourced edge in the graph with a pointer back to the passages that support it.

Stage 3: Community detection. With the graph assembled, a clustering algorithm — Leiden is the common choice — identifies densely connected subgraphs. These are communities: groups of entities that reference each other more than they reference outsiders. The indexer then uses an LLM to write a natural-language summary of each community, describing the theme, the major entities, and the key relationships. These community summaries become top-level retrieval units.

Stage 4: Graph-aware retrieval. At query time, the retriever no longer does a single vector lookup. It chooses between retrieval strategies based on the question shape — typically local or global search, covered in the next section — and assembles context that mixes entity neighborhoods, relationship paths, community summaries, and the original source passages. The generator receives a structured context, not a flat list of chunks.

The Microsoft Research reference implementation — released as open source on GitHub in 2024 — is the canonical version. Several other implementations now exist. The pattern generalizes beyond any one library.

Local search vs. global search

GraphRAG's two retrieval modes are worth understanding independently, because they address different question types and have different costs.

Local search answers questions about specific entities. The query mentions something — a company, a product, a person — and the retriever pulls that entity's immediate graph neighborhood: the entity node, its direct relationships, connected entities one or two hops out, and the source passages that produced those relationships. This is the graph equivalent of a focused lookup, and it is cheap. It works well for questions like "tell me everything we know about Entity X" or "what is the relationship between Entity X and Entity Y".

Global search answers corpus-level questions that no single entity neighborhood can cover. "What are the main themes across our 5,000 analyst reports?" "Which strategic trends appear in our competitive intelligence from the last three years?" These questions require aggregating across the whole corpus, and local search — no matter how you seed it — returns the neighborhood of whatever entities the query happened to name.

Global search uses the community summaries built during indexing. The retriever scores each community summary against the query, pulls the top-ranked ones, and in the Microsoft reference design aggregates partial answers from each into a final response using a map-reduce-style pattern. This is expensive — multiple LLM calls per query at minimum, scaling with the number of communities considered — and it is the mode that justifies GraphRAG's existence for corpus-wide reasoning tasks.

The right engineering posture is that local and global are different tools, not alternatives. A production GraphRAG system needs query routing — by rule, by classifier, or by a small judge prompt — that picks local for entity-focused questions and global for synthesis questions. Running global search on every query bleeds money; running local on every query misses the questions GraphRAG was built for.

Worked example — a hypothetical analyst-reports use case

Hypothetical scenario, not a shipped product. A competitive-intelligence team at a mid-sized SaaS company maintains roughly 5,000 analyst reports, newsletters, press releases, and internal memos spanning five years. They built a chunk-based RAG six months ago and use it daily.

The system handles most traffic well. "What did Gartner say about vendor X in Q2 2024?" returns a clean, sourced answer. "Summarize the key risks in this McKinsey report" — handled. "What is the pricing model of our top three competitors?" — handled. These are local-in-the-corpus questions, and chunk retrieval is fine at them.

Then a senior analyst asks: "Which of our competitors have partnered with the same cloud providers over the last three years, and which of those partnerships overlap with our own cloud relationships?"

The chunk-based system retrieves ten chunks. Three mention competitor-cloud partnerships. Two mention the company's own cloud relationships. Five are adjacent-but-not-quite — analyst commentary about the cloud market, a reseller press release, a procurement memo. The generator produces a confident-sounding answer that names two partnerships correctly, invents a third, and misses four that the corpus mentions but the retriever did not surface together.

This is not a tuning failure. The team already uses hybrid search and reranking. They ran a RAGAS evaluation — see the RAGAS evaluation walkthrough — and the scores told the story: faithfulness held on single-entity lookups, recall collapsed on multi-entity relational questions. The right chunks existed in the corpus; they never ranked into the top-k together because the query terms did not appear in all of them.

The team rebuilds the index as GraphRAG. Entities include companies, cloud providers, products, and agreement types. Relationships include "partnered with", "integrated with", "reseller of", each with a date and source passage. Community detection identifies a cluster around each major cloud provider's partnership ecosystem.

The same question now routes to global search. The retriever pulls community summaries for each cloud provider's ecosystem, identifies the overlap set, walks entity neighborhoods for each competitor named, and assembles a structured context: a list of partnerships per competitor with dates and source passages, plus the company's own cloud relationships, plus the intersection. The generator writes an answer grounded in that structure.

Illustrative metrics, not real measurements. On a 40-question evaluation of cross-document competitive-intelligence queries, hypothetical answer faithfulness might go from 0.68 to 0.84 and recall from 0.52 to 0.79. On single-document lookup queries — the bulk of daily traffic — the two systems score within noise of each other. GraphRAG did not become better at everything. It closed the specific gap that hurt most.

Cost: indexing is materially more expensive

The honest cost story, because it matters.

Building a GraphRAG index runs an LLM over every chunk at least twice — once for entity and relationship extraction, once for community summarization — plus aggregation calls for community-level summaries. On a mid-sized corpus this means hundreds of thousands of LLM calls at indexing time, sometimes millions. Chunk-and-embed runs a much cheaper embedding model per chunk and stores vectors. The two are not in the same cost bracket.

Maintenance is also more expensive. When documents change, chunk-based RAG re-embeds affected chunks and updates the vector store. GraphRAG has to re-extract entities and relationships, reconcile them with the existing graph, and potentially re-run community detection and summarization on affected subgraphs. The Microsoft reference implementation supports incremental updates, but the work is greater per document and grows with graph size.

Query-time cost is higher too, especially for global search, which aggregates across multiple community summaries per query. Local search is roughly comparable to a normal RAG query. Global search is decisively more expensive — and is the mode that answers questions regular RAG cannot, so the expense has a direct justification when used for the right queries and is waste when applied indiscriminately.

None of this makes GraphRAG wrong. It makes the decision about query mix. If 10% of your queries are cross-document synthesis and those 10% matter disproportionately — senior analysts, high-value decisions, memorable failures — the indexing investment pays. If those queries are noise-level in your traffic, it does not.

When to skip GraphRAG

Four situations where chunk-based RAG is the correct choice and GraphRAG is premature.

Your query mix is mostly single-document lookup. If users ask "what does this document say about X", chunk retrieval is the right shape. Adding a graph adds indexing cost without adding answers.

Your corpus is small. Under a few hundred documents, the graph is thin, community detection produces weak clusters, and the cost of maintaining the pipeline exceeds the benefit. A well-tuned chunk system with hybrid search and reranking usually covers small corpora fine.

You have not evaluated the current system. If you cannot point at a specific cross-document failure mode in your RAGAS or other eval scores, you do not yet know whether GraphRAG would help. Build the eval, find the failure, then pick the retrieval substrate that closes it.

Your entities are not well-defined in the corpus. GraphRAG depends on an LLM extracting clean entities and relationships. If your domain has fuzzy entity boundaries — a field where the "things" are arguments, opinions, or evolving abstractions rather than named objects — the extraction step produces a noisy graph, and downstream retrieval inherits the noise.

None of these are permanent. Corpora grow, query mixes shift, evaluation improves. The decision is about where you are now, not where you might eventually be.

Failure modes

Three anti-patterns worth flagging.

Shipping GraphRAG without query routing. Running global search on every query is expensive. Running local search on every query defeats the purpose. Either add a router — rule-based is fine to start, classifier or judge prompt later — or be honest that the system is a more expensive chunk-RAG with extra steps.

Treating extraction quality as free. The graph is only as good as the entity and relationship extraction that built it. If the extractor is a cheap model run without evaluation, the graph inherits every miss, duplicate, and hallucinated edge. Evaluate extraction quality independently — a small human-labeled sample of extracted entities and relationships against source passages — before trusting retrieval scores.

Ignoring the stale-graph problem. Knowledge graphs go stale. New documents add entities that should merge with existing ones but get created as new nodes. Relationships that used to be true get contradicted by newer documents and need versioning. A GraphRAG system without a plan for graph hygiene decays quietly. Schedule a periodic reconciliation pass, even if it is expensive.

Our position

Four opinionated stances.

GraphRAG is a surgical tool, not a default. Reach for it when your RAGAS or equivalent evals surface cross-document reasoning failures that chunk retrieval structurally cannot close. Do not reach for it because it sounds more advanced than chunk RAG.

Evaluate first, pick substrate second. The question is not "should we use GraphRAG". It is "what does our evaluation say our retrieval layer is failing at, and what is the cheapest substrate that fixes that failure". Sometimes the answer is hybrid search plus reranking. Sometimes it is GraphRAG. The decision is downstream of the measurement.

Separate local and global budgets. Treat global search as a premium query class — routed explicitly, rate-limited, optionally gated by user role. It is the mode that justifies GraphRAG, and it is also the mode that burns money fastest.

Graph hygiene is a recurring engineering cost, not a one-time setup. Budget for ongoing reconciliation, entity merging, and stale-relationship cleanup. A graph that is not maintained becomes a worse retrieval substrate than the chunks it replaced.

Frame GraphRAG as a context engineering choice, not a prompting one. The work sits at the context layer — see the Context Engineering Maturity Model and context engineering best practices. The prompt-level disciplines — RCAF, the SurePrompts Quality Rubric, the agentic prompt stack — sit on top of whatever retrieval substrate you chose. Getting the substrate right is upstream of getting the prompt right.

GraphRAG: When Knowledge Graphs Beat Chunk-Based Retrieval

Why chunk-based RAG hits a ceiling on cross-document questions

What GraphRAG actually does

Local search vs. global search

Worked example — a hypothetical analyst-reports use case

Cost: indexing is materially more expensive

When to skip GraphRAG

Failure modes

Our position

AI prompts built for recruiters

Related Resources

RAG System Design Template

Related Articles

RAG Prompt Engineering: How to Write Prompts That Work With Retrieval-Augmented Generation (2026)

The 4 Reusable RAG Prompt Patterns: A Named-Patterns Reference (2026)

RAGAS Evaluation: A Walkthrough for Quantifying RAG Quality

GraphRAG: When Knowledge Graphs Beat Chunk-Based Retrieval

Why chunk-based RAG hits a ceiling on cross-document questions

What GraphRAG actually does

Local search vs. global search

Worked example — a hypothetical analyst-reports use case

Cost: indexing is materially more expensive

When to skip GraphRAG

Failure modes

Our position

Related reading

AI prompts built for recruiters

Related Resources

RAG System Design Template

Related Articles

RAG Prompt Engineering: How to Write Prompts That Work With Retrieval-Augmented Generation (2026)

The 4 Reusable RAG Prompt Patterns: A Named-Patterns Reference (2026)

RAGAS Evaluation: A Walkthrough for Quantifying RAG Quality