MCP Evals Documentation (original) (raw)

Overview

MCP Evals is a Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

Installation

As a Node.js Package

As a GitHub Action

Add the following to your workflow file:

Usage

1. Create Your Evaluation File

Create a file (e.g., `evals.ts`) that exports your evaluation configuration:

2. Run the Evaluations

As a Node.js Package

You can run the evaluations using the CLI:

As a GitHub Action

The action will automatically:

Run your evaluations
Post the results as a comment on the PR
Update the comment if the PR is updated

Evaluation Results

Each evaluation returns an object with the following structure:

Configuration

Environment Variables

OPENAI_API_KEY: Your OpenAI API key (required)

Evaluation Configuration

The EvalConfig interface requires:

model: The language model to use for evaluation (e.g., GPT-4)
evals: Array of evaluation functions to run

Each evaluation function must implement:

name: Name of the evaluation
description: Description of what the evaluation tests
run: Async function that takes a model and returns an EvalResult

License

MIT