From a8f1f6083c68a3deae166d737f620c151c7933c5 Mon Sep 17 00:00:00 2001 From: Paul Carleton Date: Wed, 25 Mar 2026 11:33:24 +0000 Subject: [PATCH 1/2] docs: add AGENTS.md and CONTRIBUTING.md Codifies scenario design guidelines from past PR reviews: - fewer scenarios with more checks (CI cost multiplies per-SDK) - same check id for SUCCESS and FAIL, optimize for Ctrl+F - extend the everything-client/server rather than adding new example files - prove it passes and fails - reuse the CLI runner, don't add parallel entry points Also asks contributors to open an issue for discussion before a PR, and to target specific spec MUSTs or open issues rather than running generic AI bug-hunts against the harness. --- AGENTS.md | 81 +++++++++++++++++++++++++++++++++++++++++++++++++ CONTRIBUTING.md | 46 ++++++++++++++++++++++++++++ 2 files changed, 127 insertions(+) create mode 100644 AGENTS.md create mode 100644 CONTRIBUTING.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..91984e7 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,81 @@ +# AGENTS.md + +Guidance for AI agents (and humans) contributing to the MCP conformance test framework. + +## What this repo is + +A test harness that exercises MCP SDK implementations against the protocol spec. The coverage number that matters here is **spec coverage** — how much of the protocol the scenarios test. + +Uses **npm** (not pnpm/yarn). Don't commit `pnpm-lock.yaml` or `yarn.lock`. + +## Where to start + +**Open an issue first.** Before writing a scenario, open an issue describing which part of the spec you want to cover and roughly how. A quick discussion up front avoids wasted effort on scenarios that overlap existing work or don't fit the suite structure. + +**Don't point an agent at the repo and ask it to "find bugs."** Generic bug-hunting on a test harness produces low-signal PRs (typo fixes, unused-variable cleanups, speculative refactors). If you want to contribute via an agent, give it a concrete target: + +- Pick a specific MUST or SHOULD from the [MCP spec](https://modelcontextprotocol.io/specification/) that has no scenario yet, and ask the agent to draft one. +- Pick an [open issue](https://github.com/modelcontextprotocol/conformance/issues) and work on that. + +The valuable contribution here is **spec coverage**, not harness polish. + +## Scenario design: fewer scenarios, more checks + +**The strongest rule in this repo:** prefer one scenario with many checks over many scenarios with one check each. + +Why: + +- Each scenario often spins up its own HTTP server. These suites run in CI on every push for every SDK, so per-scenario overhead multiplies fast. +- Less code to maintain and update when the spec shifts. +- Progress on making an SDK better shows up as "pass 7/10 checks" rather than "pass 1 test, fail another" — finer-grained signal from the same run. + +### Granularity heuristic + +Ask: **"Would it make sense for someone to implement a server/client that does just this scenario?"** + +If two scenarios would always be implemented together, merge them. Examples: + +- `tools/list` + a simple `tools/call` → one scenario +- All content-type variants (image, audio, mixed, resource) → one scenario +- Full OAuth flow with token refresh → one scenario, not separate "basic" + "refresh" scenarios. A client that passes "basic" but not "refresh" just shows up as passing N−2 checks. + +Keep scenarios separate when they're genuinely independent features or when they're mutually exclusive (e.g., an SDK should support writing a server that _doesn't_ implement certain stateful features). + +### When a PR adds scenarios + +- Start with **one end-to-end scenario** covering the happy path with many checks along the way. +- Don't add "step 1 only" and "step 1+2" as separate scenarios — the second subsumes the first. +- Register the scenario in the appropriate suite list in `src/scenarios/index.ts` (`core`, `extensions`, `backcompat`, etc.). + +## Check conventions + +- **Same `id` for SUCCESS and FAIL.** A check should use one slug and flip `status` + `errorMessage`, not branch into `foo-success` vs `foo-failure` slugs. +- **Optimize for Ctrl+F on the slug.** Repetitive check blocks are fine — easier to find the failing one than to unwind a clever helper. +- Reuse `ConformanceCheck` and other types from `src/types.ts` rather than defining parallel shapes. +- Include `specReferences` pointing to the relevant spec section. + +## Descriptions and wording + +Be precise about what's **required** vs **optional**. A scenario description that tests optional behavior should make that clear — e.g. "Tests that a client _that wants a refresh token_ handles offline_access scope…" not "Tests that a client handles offline_access scope…". Don't accidentally promote a MAY/SHOULD to a MUST in the prose. + +When in doubt about spec details (OAuth parameters, audiences, grant types), check the actual spec in `modelcontextprotocol` rather than guessing. + +## Examples: prove it passes and fails + +A new scenario should come with: + +1. **A passing example** — usually by extending `examples/clients/typescript/everything-client.ts` or the everything-server, not a new file. +2. **Evidence it fails when it should** — ideally a negative example (a deliberately broken client), or at minimum a manual run showing the failure mode. + +Delete unused example scenarios. If a scenario key in the everything-client has no corresponding test, remove it. + +## Don't add new ways to run tests + +Use the existing CLI runner (`npx @modelcontextprotocol/conformance client|server ...`). If you need a feature the runner doesn't have, add it to the runner rather than building a parallel entry point. + +## Before opening a PR + +- `npm run build` passes +- `npm test` passes +- For non-trivial scenario changes, run against at least one real SDK (typescript-sdk or python-sdk) to see actual output. For changes to shared infrastructure (runner, tier-check), test against go-sdk or csharp-sdk too. +- Scenario is registered in the right suite in `src/scenarios/index.ts` diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..8152b0f --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,46 @@ +# Contributing + +Thanks for helping improve the MCP conformance suite! + +The most valuable contributions are **new conformance scenarios** that cover under-tested parts of the [MCP spec](https://modelcontextprotocol.io/specification/). If you're not sure where to start, ask in `#conformance-testing-wg` on the MCP Contributors Discord. + +## Before you start + +**Open an issue first.** Describe which part of the spec you want to cover and roughly how — a short discussion up front saves everyone time on scenarios that overlap existing work or don't fit the suite structure. + +Then read **[AGENTS.md](./AGENTS.md)** — it's the design guide for scenarios and checks. The short version: + +- **Fewer scenarios, more checks.** Each scenario spins up its own server and runs in CI for every SDK. One scenario with 10 checks beats 10 scenarios with one check each. +- **Prove it passes and fails.** Extend the existing everything-client/server to pass your scenario, and show (or include) a failing case. +- **Reuse the CLI runner.** Don't add parallel entry points. + +If you're using an AI agent to help, please **don't** point it at the repo with a generic "find bugs" prompt — give it a specific MUST from the spec or an open issue to work on. See AGENTS.md for details. + +## Setup + +```sh +npm install +npm run build +npm test +``` + +This repo uses **npm** — don't commit `pnpm-lock.yaml` or `yarn.lock`. + +## Running your scenario + +```sh +# Against the bundled TypeScript example +npm run build +node dist/index.js client --command "tsx examples/clients/typescript/everything-client.ts" --scenario + +# Against a server +node dist/index.js server --url http://localhost:3000/mcp --scenario +``` + +See the [README](./README.md) for full CLI options and the [SDK Integration Guide](./SDK_INTEGRATION.md) for testing against a real SDK. + +## Pull requests + +- Register your scenario in the right suite in `src/scenarios/index.ts` +- Run against at least one real SDK before opening the PR — we'll ask what the output looked like +- Keep PRs focused; one feature or scenario group at a time From 6f92f46616619421202a299a88f983bf46426e5f Mon Sep 17 00:00:00 2001 From: Paul Carleton Date: Wed, 25 Mar 2026 11:34:29 +0000 Subject: [PATCH 2/2] docs: broaden issue-first guidance to cover bugs, not just scenarios --- AGENTS.md | 2 +- CONTRIBUTING.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 91984e7..35e854f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -10,7 +10,7 @@ Uses **npm** (not pnpm/yarn). Don't commit `pnpm-lock.yaml` or `yarn.lock`. ## Where to start -**Open an issue first.** Before writing a scenario, open an issue describing which part of the spec you want to cover and roughly how. A quick discussion up front avoids wasted effort on scenarios that overlap existing work or don't fit the suite structure. +**Open an issue first** — whether you've hit a bug in the harness or want to propose a new scenario. For scenarios, sketch which part of the spec you want to cover and roughly how; for bugs, include the command you ran and the output. Either way, a short discussion up front beats review churn on a PR that overlaps existing work or heads in a direction we're not going. **Don't point an agent at the repo and ask it to "find bugs."** Generic bug-hunting on a test harness produces low-signal PRs (typo fixes, unused-variable cleanups, speculative refactors). If you want to contribute via an agent, give it a concrete target: diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8152b0f..7f018da 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,7 +6,7 @@ The most valuable contributions are **new conformance scenarios** that cover und ## Before you start -**Open an issue first.** Describe which part of the spec you want to cover and roughly how — a short discussion up front saves everyone time on scenarios that overlap existing work or don't fit the suite structure. +**Open an issue first** — whether you've found a bug or want to propose a new scenario. A short discussion up front saves everyone time on PRs that overlap existing work or head in a direction we're not going. Then read **[AGENTS.md](./AGENTS.md)** — it's the design guide for scenarios and checks. The short version: