modelcontextprotocol · pcarleton · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,81 @@
+# AGENTS.md
+
+Guidance for AI agents (and humans) contributing to the MCP conformance test framework.
+
+## What this repo is
+
+A test harness that exercises MCP SDK implementations against the protocol spec. The coverage number that matters here is **spec coverage** — how much of the protocol the scenarios test.
+
+Uses **npm** (not pnpm/yarn). Don't commit `pnpm-lock.yaml` or `yarn.lock`.
+
+## Where to start
+
+**Open an issue first** — whether you've hit a bug in the harness or want to propose a new scenario. For scenarios, sketch which part of the spec you want to cover and roughly how; for bugs, include the command you ran and the output. Either way, a short discussion up front beats review churn on a PR that overlaps existing work or heads in a direction we're not going.
+
+**Don't point an agent at the repo and ask it to "find bugs."** Generic bug-hunting on a test harness produces low-signal PRs (typo fixes, unused-variable cleanups, speculative refactors). If you want to contribute via an agent, give it a concrete target:
+
+- Pick a specific MUST or SHOULD from the [MCP spec](https://modelcontextprotocol.io/specification/) that has no scenario yet, and ask the agent to draft one.
+- Pick an [open issue](https://github.com/modelcontextprotocol/conformance/issues) and work on that.
+
+The valuable contribution here is **spec coverage**, not harness polish.
+
+## Scenario design: fewer scenarios, more checks
+
+**The strongest rule in this repo:** prefer one scenario with many checks over many scenarios with one check each.
+
+Why:
+
+- Each scenario often spins up its own HTTP server. These suites run in CI on every push for every SDK, so per-scenario overhead multiplies fast.
+- Less code to maintain and update when the spec shifts.
+- Progress on making an SDK better shows up as "pass 7/10 checks" rather than "pass 1 test, fail another" — finer-grained signal from the same run.
+
+### Granularity heuristic
+
+Ask: **"Would it make sense for someone to implement a server/client that does just this scenario?"**
+
+If two scenarios would always be implemented together, merge them. Examples:
+
+- `tools/list` + a simple `tools/call` → one scenario
+- All content-type variants (image, audio, mixed, resource) → one scenario
+- Full OAuth flow with token refresh → one scenario, not separate "basic" + "refresh" scenarios. A client that passes "basic" but not "refresh" just shows up as passing N−2 checks.
+
+Keep scenarios separate when they're genuinely independent features or when they're mutually exclusive (e.g., an SDK should support writing a server that _doesn't_ implement certain stateful features).
+
+### When a PR adds scenarios
+
+- Start with **one end-to-end scenario** covering the happy path with many checks along the way.
+- Don't add "step 1 only" and "step 1+2" as separate scenarios — the second subsumes the first.
+- Register the scenario in the appropriate suite list in `src/scenarios/index.ts` (`core`, `extensions`, `backcompat`, etc.).
+
+## Check conventions
+
+- **Same `id` for SUCCESS and FAIL.** A check should use one slug and flip `status` + `errorMessage`, not branch into `foo-success` vs `foo-failure` slugs.
+- **Optimize for Ctrl+F on the slug.** Repetitive check blocks are fine — easier to find the failing one than to unwind a clever helper.
+- Reuse `ConformanceCheck` and other types from `src/types.ts` rather than defining parallel shapes.
+- Include `specReferences` pointing to the relevant spec section.
+
+## Descriptions and wording
+
+Be precise about what's **required** vs **optional**. A scenario description that tests optional behavior should make that clear — e.g. "Tests that a client _that wants a refresh token_ handles offline_access scope…" not "Tests that a client handles offline_access scope…". Don't accidentally promote a MAY/SHOULD to a MUST in the prose.
+
+When in doubt about spec details (OAuth parameters, audiences, grant types), check the actual spec in `modelcontextprotocol` rather than guessing.
+
+## Examples: prove it passes and fails
+
+A new scenario should come with:
+
+1. **A passing example** — usually by extending `examples/clients/typescript/everything-client.ts` or the everything-server, not a new file.
+2. **Evidence it fails when it should** — ideally a negative example (a deliberately broken client), or at minimum a manual run showing the failure mode.
+
+Delete unused example scenarios. If a scenario key in the everything-client has no corresponding test, remove it.
+
+## Don't add new ways to run tests
+
+Use the existing CLI runner (`npx @modelcontextprotocol/conformance client|server ...`). If you need a feature the runner doesn't have, add it to the runner rather than building a parallel entry point.
+
+## Before opening a PR
+
+- `npm run build` passes
+- `npm test` passes
+- For non-trivial scenario changes, run against at least one real SDK (typescript-sdk or python-sdk) to see actual output. For changes to shared infrastructure (runner, tier-check), test against go-sdk or csharp-sdk too.
+- Scenario is registered in the right suite in `src/scenarios/index.ts`
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,46 @@
+# Contributing
+
+Thanks for helping improve the MCP conformance suite!
+
+The most valuable contributions are **new conformance scenarios** that cover under-tested parts of the [MCP spec](https://modelcontextprotocol.io/specification/). If you're not sure where to start, ask in `#conformance-testing-wg` on the MCP Contributors Discord.
+
+## Before you start
+
+**Open an issue first** — whether you've found a bug or want to propose a new scenario. A short discussion up front saves everyone time on PRs that overlap existing work or head in a direction we're not going.
+
+Then read **[AGENTS.md](./AGENTS.md)** — it's the design guide for scenarios and checks. The short version:
+
+- **Fewer scenarios, more checks.** Each scenario spins up its own server and runs in CI for every SDK. One scenario with 10 checks beats 10 scenarios with one check each.
+- **Prove it passes and fails.** Extend the existing everything-client/server to pass your scenario, and show (or include) a failing case.
+- **Reuse the CLI runner.** Don't add parallel entry points.
+
+If you're using an AI agent to help, please **don't** point it at the repo with a generic "find bugs" prompt — give it a specific MUST from the spec or an open issue to work on. See AGENTS.md for details.
+
+## Setup
+
+```sh
+npm install
+npm run build
+npm test
+```
+
+This repo uses **npm** — don't commit `pnpm-lock.yaml` or `yarn.lock`.
+
+## Running your scenario
+
+```sh
+# Against the bundled TypeScript example
+npm run build
+node dist/index.js client --command "tsx examples/clients/typescript/everything-client.ts" --scenario <your-scenario>
+
+# Against a server
+node dist/index.js server --url http://localhost:3000/mcp --scenario <your-scenario>
+```
+
+See the [README](./README.md) for full CLI options and the [SDK Integration Guide](./SDK_INTEGRATION.md) for testing against a real SDK.
+
+## Pull requests
+
+- Register your scenario in the right suite in `src/scenarios/index.ts`
+- Run against at least one real SDK before opening the PR — we'll ask what the output looked like
+- Keep PRs focused; one feature or scenario group at a time