Skip to main content

RAGAS

RAGAS is an open-source evaluation framework for retrieval-augmented generation systems. It decomposes RAG quality into four primary metrics: faithfulness (does the answer stick to the retrieved context), answer relevance (does the answer address the question), context precision (how useful were the retrieved documents), and context recall (were all relevant documents retrieved). Faithfulness and answer relevance are scored using LLM-as-judge internally, while context precision and recall compare retrieval results against reference annotations. RAGAS is the standard starting point for teams that want quantitative signals on RAG pipelines rather than eyeballing outputs, and it plugs into most popular RAG stacks.

Example

A knowledge-base Q&A team wires RAGAS into CI. On every change to the retriever or generator prompt, the harness runs 200 reference questions through the pipeline, computes the four metrics, and posts the deltas to the pull request. A retriever tweak that raises context recall from 0.72 to 0.81 but drops context precision from 0.88 to 0.74 is flagged — the new retriever is fetching more of the right documents but also more noise, so the team adds a re-ranker before shipping.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts