Eval & Testing

LLM and AI-agent evaluation, prompt testing and benchmarking.

17 vendors in this category

A

Agenta

Eval & Testing

Open-source LLM evaluation prompt management and observability platform

Germany

A

Arize AI

Eval & Testing

Agent observability evaluation and improvement platform with open-source Phoenix

USA

A

Arthur AI

Eval & Testing

ML monitoring fairness and LLM evaluation platform for enterprises

USA

Braintrust

Braintrust

Platforms & Products

AI observability platform for building quality AI products with evaluations, monitoring, and optimization tools

Horizontal (Industry Agnostic)
C

Confident AI (DeepEval)

Eval & Testing

Open-source LLM evaluation framework with hosted observability platform

USA

D

Deepchecks

Eval & Testing

Continuous validation and testing platform for ML models and LLM apps

Israel

Galileo

Galileo

Analytics & Conversation Intelligence

AI observability and evaluation platform that turns offline evals into production guardrails for AI systems

Horizontal (Industry Agnostic)
G

Giskard

Eval & Testing

Open-source LLM testing and evaluation framework for quality and safety

France

L

LangWatch

Conversational & Voice QA

AI agent testing LLM evaluation and observability platform

Netherlands

M

Maxim AI

Eval & Testing

End-to-end evaluation and observability platform for AI agents

India/USA

O

Openlayer

Eval & Testing

AI agent evaluation and stress-testing platform for pre-deployment

USA

Opik

Opik

Open Source Projects

Open-source LLM evaluation platform for debugging, evaluating, and monitoring LLM applications and RAG systems

Horizontal (Industry Agnostic)
P

Patronus AI

Eval & Testing

Automated LLM evaluation and security platform for regulated industries

USA

P

Promptfoo

Eval & Testing

Open-source AI security and testing platform for LLM vulnerabilities

USA

R

Ragas

Eval & Testing

Open-source framework for evaluating RAG pipelines and LLM applications

USA

T

TruLens / TruEra

Eval & Testing

Open-source LLM evaluation and tracing framework (TruEra acquired by Snowflake)

USA

V

Vellum

Eval & Testing

LLM development platform with evaluation and testing capabilities

USA