Decision Guide

Three structured guides for answering “which eval framework(s) should we use?” Skip what’s not relevant to your situation.

How to Choose a Guide

If you have a specific problem to solve → By Use Case
Examples: “I need to evaluate my RAG pipeline”, “I want to red team my chatbot”, “I need production monitoring”

If you’re unsure where to start → By Team Type
Examples: “We’re a 3-person startup”, “We’re an enterprise ML platform team”, “We’re subject to EU AI Act”

If you’re building an eval strategy, not just picking one tool → By Lifecycle Stage
The insight: prototype, pre-production, and production monitoring each need different frameworks. Most teams are missing at least one stage.

The 5-Minute Decision

Answer these three questions:

1. What are you evaluating?

RAG pipeline → RAGAS
Conversational / agentic LLM → DeepEval
Adversarial robustness → promptfoo
Production traffic → Langfuse
Safety for high-risk AI → inspect_ai

2. What stage are you at?

Active development → RAGAS or DeepEval (Stage 1)
Pre-launch → add promptfoo (Stage 2)
Live in production → add Langfuse (Stage 3)
Preparing for compliance audit → add inspect_ai (Stage 4)

3. What’s your regulatory context?

No regulatory pressure → promptfoo for pre-launch adversarial; skip inspect_ai for now
EU-regulated / high-risk AI → inspect_ai is the most defensible choice for compliance evidence

Quick Reference Table

Situation	Framework	Why
RAG retrieval accuracy	RAGAS	Purpose-built retrieval metrics
LLM CI test suite	DeepEval	pytest-native, threshold-based
Adversarial pre-launch	promptfoo	OWASP LLM Top 10, auto-generated attack cases
Production tracing	Langfuse	Live inference monitoring, EU Cloud available
EU AI Act evidence	inspect_ai	Audit-grade logs, AISI institutional backing
Public benchmark contribution	OpenAI Evals	YAML-only, no code required
Startup, start here	RAGAS + Langfuse	Maximum signal, minimal setup
Enterprise regulated, start here	inspect_ai + Langfuse EU	Compliance-first stack

→ Back to main README

Framework profiles

Decision Guide

How to Choose a Guide

The 5-Minute Decision

Guides

By Use Case

By Team Type

By Lifecycle Stage

Quick Reference Table

Table of contents