Open source RAG benchmarking

RAGScope

Ground truth for your retrieval pipeline

Upload a corpus, run four retrieval strategies, and see ranked scores for faithfulness, context utilization, and answer relevancy. Know what works on your data before shipping.

12 strategy runs per day - no account required.

Why it matters

Stop guessing which retrieval strategy fits your corpus

Three metrics that tell the full story

Faithfulness measures whether the answer is grounded in your documents. Context utilization measures whether the retrieved chunks were relevant. Answer relevancy measures whether the answer addressed the question. Most RAG evaluations ignore at least one of these. RAGScope scores all three.

Four retrieval strategies, one benchmark run

Naive RAG, HyDE, Multi-Query, and Hybrid BM25+Dense. Combine any of them with contextual compression as an optional post-retrieval step. Run against the same question on the same corpus. See the ranked result. Know which approach fits your data before writing a line of production code.

Bring your own key for unlimited runs

Guest access covers 12 strategy runs per day using the shared API key. Paste your own OpenAI or Anthropic key to remove all limits. Your key is read directly from the browser and never forwarded to the backend.

What you get

Everything you need for serious RAG evaluation

Benchmark 4 strategies head to head

Upload a corpus, ask a question, and receive a ranked comparison across four retrieval approaches. Radar charts, latency bars, and a sortable comparison table give you the full picture at a glance.

Ground truth metrics with RAGAS

Every run is evaluated by RAGAS, an open-source framework that uses GPT-4o-mini as an impartial judge. Scores are reproducible, explainable, and aligned with real retrieval quality, not proxy metrics.

Modular and extensible by design

Every component follows a registry pattern. Add a new retrieval strategy, chunker, or LLM provider with a single file. No other code changes required. The API and UI discover it automatically.

Ready to measure what actually matters?

Start with your own corpus and question. Results in under a minute. No account required.