Agentic RAG
Agentic RAG is a pattern where retrieval is treated as a tool call inside an agent loop rather than as a fixed first step in a linear pipeline. A traditional RAG pipeline always retrieves once, then generates; an agentic RAG system lets the model decide whether to retrieve at all, what query to issue, when to re-retrieve after reading the first batch of results, and when it has enough to answer. Retrieval can happen several times, interleaved with reasoning, tool calls, or sub-queries generated mid-trace. The upside is that easy questions skip retrieval entirely, hard multi-hop questions get the extra passes they need, and the system can recover when the first retrieval misses. The cost is latency variance, harder evals, and the usual agent failure modes — runaway loops and over-retrieval.
Example
A customer-support assistant gets the question "why did my invoice total change between last month and this one?". A linear RAG pipeline retrieves once on the raw question and returns generic billing-policy docs. The agentic version instead issues three interleaved retrievals — pricing tiers, the account's recent subscription changes, and the specific invoice line items — and only then drafts an answer that reconciles the three sources. Average latency rises from 1.2s to 2.8s; answer accuracy on multi-hop billing questions rises from illustrative 0.62 to 0.84.
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts