Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# AGENTS.md

Guidance for AI agents (and humans) contributing to the MCP conformance test framework.

## What this repo is

A test harness that exercises MCP SDK implementations against the protocol spec. The coverage number that matters here is **spec coverage** — how much of the protocol the scenarios test.

Uses **npm** (not pnpm/yarn). Don't commit `pnpm-lock.yaml` or `yarn.lock`.

## Where to start

**Open an issue first** — whether you've hit a bug in the harness or want to propose a new scenario. For scenarios, sketch which part of the spec you want to cover and roughly how; for bugs, include the command you ran and the output. Either way, a short discussion up front beats review churn on a PR that overlaps existing work or heads in a direction we're not going.

**Don't point an agent at the repo and ask it to "find bugs."** Generic bug-hunting on a test harness produces low-signal PRs (typo fixes, unused-variable cleanups, speculative refactors). If you want to contribute via an agent, give it a concrete target:

- Pick a specific MUST or SHOULD from the [MCP spec](https://modelcontextprotocol.io/specification/) that has no scenario yet, and ask the agent to draft one.
- Pick an [open issue](https://github.com/modelcontextprotocol/conformance/issues) and work on that.

The valuable contribution here is **spec coverage**, not harness polish.

## Scenario design: fewer scenarios, more checks

**The strongest rule in this repo:** prefer one scenario with many checks over many scenarios with one check each.

Why:

- Each scenario often spins up its own HTTP server. These suites run in CI on every push for every SDK, so per-scenario overhead multiplies fast.
- Less code to maintain and update when the spec shifts.
- Progress on making an SDK better shows up as "pass 7/10 checks" rather than "pass 1 test, fail another" — finer-grained signal from the same run.

### Granularity heuristic

Ask: **"Would it make sense for someone to implement a server/client that does just this scenario?"**

If two scenarios would always be implemented together, merge them. Examples:

- `tools/list` + a simple `tools/call` → one scenario
- All content-type variants (image, audio, mixed, resource) → one scenario
- Full OAuth flow with token refresh → one scenario, not separate "basic" + "refresh" scenarios. A client that passes "basic" but not "refresh" just shows up as passing N−2 checks.

Keep scenarios separate when they're genuinely independent features or when they're mutually exclusive (e.g., an SDK should support writing a server that _doesn't_ implement certain stateful features).

### When a PR adds scenarios

- Start with **one end-to-end scenario** covering the happy path with many checks along the way.
- Don't add "step 1 only" and "step 1+2" as separate scenarios — the second subsumes the first.
- Register the scenario in the appropriate suite list in `src/scenarios/index.ts` (`core`, `extensions`, `backcompat`, etc.).

## Check conventions

- **Same `id` for SUCCESS and FAIL.** A check should use one slug and flip `status` + `errorMessage`, not branch into `foo-success` vs `foo-failure` slugs.
- **Optimize for Ctrl+F on the slug.** Repetitive check blocks are fine — easier to find the failing one than to unwind a clever helper.
- Reuse `ConformanceCheck` and other types from `src/types.ts` rather than defining parallel shapes.
- Include `specReferences` pointing to the relevant spec section.

## Descriptions and wording

Be precise about what's **required** vs **optional**. A scenario description that tests optional behavior should make that clear — e.g. "Tests that a client _that wants a refresh token_ handles offline_access scope…" not "Tests that a client handles offline_access scope…". Don't accidentally promote a MAY/SHOULD to a MUST in the prose.

When in doubt about spec details (OAuth parameters, audiences, grant types), check the actual spec in `modelcontextprotocol` rather than guessing.

## Examples: prove it passes and fails

A new scenario should come with:

1. **A passing example** — usually by extending `examples/clients/typescript/everything-client.ts` or the everything-server, not a new file.
2. **Evidence it fails when it should** — ideally a negative example (a deliberately broken client), or at minimum a manual run showing the failure mode.

Delete unused example scenarios. If a scenario key in the everything-client has no corresponding test, remove it.

## Don't add new ways to run tests

Use the existing CLI runner (`npx @modelcontextprotocol/conformance client|server ...`). If you need a feature the runner doesn't have, add it to the runner rather than building a parallel entry point.

## Before opening a PR

- `npm run build` passes
- `npm test` passes
- For non-trivial scenario changes, run against at least one real SDK (typescript-sdk or python-sdk) to see actual output. For changes to shared infrastructure (runner, tier-check), test against go-sdk or csharp-sdk too.
- Scenario is registered in the right suite in `src/scenarios/index.ts`
46 changes: 46 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Contributing

Thanks for helping improve the MCP conformance suite!

The most valuable contributions are **new conformance scenarios** that cover under-tested parts of the [MCP spec](https://modelcontextprotocol.io/specification/). If you're not sure where to start, ask in `#conformance-testing-wg` on the MCP Contributors Discord.

## Before you start

**Open an issue first** — whether you've found a bug or want to propose a new scenario. A short discussion up front saves everyone time on PRs that overlap existing work or head in a direction we're not going.

Then read **[AGENTS.md](./AGENTS.md)** — it's the design guide for scenarios and checks. The short version:

- **Fewer scenarios, more checks.** Each scenario spins up its own server and runs in CI for every SDK. One scenario with 10 checks beats 10 scenarios with one check each.
- **Prove it passes and fails.** Extend the existing everything-client/server to pass your scenario, and show (or include) a failing case.
- **Reuse the CLI runner.** Don't add parallel entry points.

If you're using an AI agent to help, please **don't** point it at the repo with a generic "find bugs" prompt — give it a specific MUST from the spec or an open issue to work on. See AGENTS.md for details.

## Setup

```sh
npm install
npm run build
npm test
```

This repo uses **npm** — don't commit `pnpm-lock.yaml` or `yarn.lock`.

## Running your scenario

```sh
# Against the bundled TypeScript example
npm run build
node dist/index.js client --command "tsx examples/clients/typescript/everything-client.ts" --scenario <your-scenario>

# Against a server
node dist/index.js server --url http://localhost:3000/mcp --scenario <your-scenario>
```

See the [README](./README.md) for full CLI options and the [SDK Integration Guide](./SDK_INTEGRATION.md) for testing against a real SDK.

## Pull requests

- Register your scenario in the right suite in `src/scenarios/index.ts`
- Run against at least one real SDK before opening the PR — we'll ask what the output looked like
- Keep PRs focused; one feature or scenario group at a time
Loading