Question 1

What are the key features of Iris Eval?

Accepted Answer

Hierarchical trace logging with latency, token usage, and cost tracking. 12 built-in evaluation rules for safety, relevance, and completeness. Real-time web dashboard for trace visualization and cost breakdowns. Automatic agent discovery for MCP-compatible clients. Customizable evaluation rules using Zod schemas.

Question 2

What can I use Iris Eval for?

Accepted Answer

Monitoring production AI agents for PII leakage and hallucination markers. Enforcing budget thresholds to prevent runaway AI agent costs. Debugging agent performance by analyzing hierarchical span trees. Auditing agent tool usage to optimize efficiency and reduce unnecessary calls.

Question 3

What tools does Iris Eval provide?

Accepted Answer

log_trace: Log an agent execution with spans, tool calls, token usage, and cost.. evaluate_output: Score output quality against completeness, relevance, safety, and cost rules.. get_traces: Query stored traces with filtering, pagination, and time-range support..

Question 4

How do I install Iris Eval?

Accepted Answer

Install Iris Eval by running: npx @iris-eval/mcp-server

Question 5

What are the requirements for Iris Eval?

Accepted Answer

Iris Eval requires the following environment variables: IRIS_TRANSPORT (optional), IRIS_PORT (optional), IRIS_DB_PATH (optional), IRIS_LOG_LEVEL (optional), IRIS_DASHBOARD (optional). You'll also need a compatible MCP client like Claude Desktop or Claude Code.

Question 6

Is Iris Eval free to use?

Accepted Answer

Yes, Iris Eval is open source and free to use. You can find the source code on GitHub.

Question 7

What MCP clients support Iris Eval?

Accepted Answer

Iris Eval works with any MCP-compatible client including Claude Desktop (Anthropic's official desktop app), Claude Code (CLI tool), Cursor, and other editors with MCP support.

Question 8

How do I configure Iris Eval?

Accepted Answer

Configure Iris Eval by adding it to your MCP client's config file. The setup block at the top of this page generates a ready-to-paste config for Claude Code, Cursor, Codex, Windsurf, and Claude Desktop.


Trace Logging	Hierarchical span trees with per-tool-call latency, token usage, and cost in USD. Stored in SQLite, queryable instantly.
Output Evaluation	12 built-in rules across 4 categories: completeness, relevance, safety, cost. PII detection, prompt injection patterns, hallucination markers. Add custom rules with Zod schemas.
Cost Visibility	Aggregate cost across all agents over any time window. Set budget thresholds. Get flagged when agents overspend.
Web Dashboard	Real-time dark-mode UI with trace visualization, eval results, and cost breakdowns.

Flag	Default	Description
`--transport`	`stdio`	Transport type: `stdio` or `http`
`--port`	`3000`	HTTP transport port
`--db-path`	`~/.iris/iris.db`	SQLite database path
`--config`	`~/.iris/config.json`	Config file path
`--api-key`	—	API key for HTTP authentication
`--dashboard`	`false`	Enable web dashboard
`--dashboard-port`	`6920`	Dashboard port

Variable	Description
`IRIS_TRANSPORT`	Transport type
`IRIS_PORT`	HTTP port
`IRIS_DB_PATH`	Database path
`IRIS_LOG_LEVEL`	Log level: debug, info, warn, error
`IRIS_DASHBOARD`	Enable dashboard

Iris Eval MCP Server

Iris — The Agent Eval Standard for MCP

The Problem

What You Get

Quickstart

Other Install Methods

MCP Tools

Cloud Tier (Coming Soon)

Examples

Community

CLI Arguments

Environment Variables

Tools 3

Environment Variables

Try it

Frequently Asked Questions

Turn this server into reusable context

Iris — The Agent Eval Standard for MCP

The Problem

What You Get

Quickstart

Other Install Methods

MCP Tools

Cloud Tier (Coming Soon)

Examples

Community

CLI Arguments

Environment Variables

Tools 3

Environment Variables

Try it

Frequently Asked Questions

Turn this server into reusable context

Related MCP Servers