Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

permissions:
contents: read

jobs:
python-sdk:
name: Python SDK (py${{ matrix.python-version }})
runs-on: ubuntu-latest
defaults:
run:
working-directory: sdk/python
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"

- name: Run tests
run: python -m pytest --tb=short -q
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.claude/settings.local.json
.idea
.vscode
dist
node_modules
.DS_Store
__pycache__
*.pyc
.env
69 changes: 69 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
[Problem Statement](docs/problem-statement.md) |
[Use Cases](docs/use-cases.md) |
[Approaches](docs/approaches.md) |
[Open Questions](docs/open-questions.md) |
[Experimental Findings](docs/experimental-findings.md) |
[Related Work](docs/related-work.md) |
Contributing

# Contributing

## How to Participate

This Interest Group welcomes contributions from anyone interested in tool annotations, data classification, and sensitivity metadata for MCP. You can participate by:

- Opening or commenting on GitHub Discussions in this repo
- Sharing experimental findings from your own implementations
- Contributing to documentation and pattern evaluation

## Communication Channels

| Channel | Purpose | Response Expectation |
|---------|---------|----------------------|
| [Discord: #tool-annotations-ig](https://discord.com/channels/1358869848138059966/1482836798517543073) | Quick questions, coordination, async discussion | Best effort |
| GitHub Discussions | Long-form technical proposals, experimental findings | Weekly triage |
| This repository | Living reference for approaches, findings, and decisions | Updated after meetings |

## Meetings

*To be scheduled* — likely biweekly working sessions once the group establishes momentum.

Meeting norms:

- Agendas published 24 hours in advance
- Notes published within 48 hours

## Decision-Making

As an Interest Group, we operate by **rough consensus** — we're exploring and recommending, not deciding. Outputs include:

- Documented requirements and use cases
- Evaluated approaches with findings
- Recommendations to relevant WGs or as SEP proposals

## Contribution Guidelines

### Documenting Approaches and Findings

When adding experimental findings or new approaches:

- Include enough detail for others to reproduce or evaluate
- Note which clients and servers were tested
- Be explicit about what worked, what didn't, and what remains untested
- Attribute community input with GitHub handles and link to the source where possible

### Community Input

When adding quotes or input from community discussions:

- Attribute to the contributor by name and GitHub handle
- Link to the original source (Discord thread, GitHub comment, etc.) where possible
- Present input as blockquotes to distinguish it from editorial content

### Filing Issues

Use GitHub Issues for:

- Proposing new approaches or use cases
- Reporting gaps in documentation
- Tracking action items from meetings
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2025 MCP Trust Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
85 changes: 85 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,86 @@
# experimental-ext-tool-annotations

> ⚠️ **Experimental** — This repository is an incubation space for the Tool Annotations Interest Group. Contents are exploratory and do not represent official MCP specifications or recommendations.

## Mission

This Interest Group explores how MCP tool annotations can be enhanced to support data classification, sensitivity labeling, and provenance tracking — enabling hosts and clients to make informed, policy-driven decisions about data flow across tool boundaries.

As a starting point, we are contributing a reference SDK implementation based on [SEP-1913](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/1913) (Trust & Sensitivity Annotations).

## Scope

### In Scope

- **Requirements gathering:** Documenting use cases and constraints for tool-level data sensitivity metadata
- **Pattern exploration:** Testing and evaluating annotation schemas, policy enforcement models, and sensitivity propagation strategies
- **Proof of concepts:** Maintaining a shared repo of reference implementations and experimental findings (starting with the SEP-1913 Python SDK)

### Out of Scope

- **Approving spec changes:** This IG does not have authority to approve protocol changes; recommendations flow through the SEP process
- **Implementation mandates:** We can document patterns but not require specific client or server behavior

## Problem Statement

MCP tools today are semantically opaque when it comes to data sensitivity. A tool's definition includes its name, description, and input schema — but nothing about the nature of the data it returns or processes. There is no machine-readable signal that allows a host to differentiate a benign health check from a HIPAA-regulated patient record lookup.

- **No sensitivity metadata** — Tools carry no data classification labels; hosts cannot distinguish PII from public data
- **No destination awareness** — The protocol does not express where tool outputs may flow (storage, third-party APIs, user display)
- **No policy enforcement surface** — Without structured annotations, security policies cannot be evaluated at the protocol level
- **LLM-dependent safety** — Data flow decisions rely entirely on the LLM's interpretation of tool descriptions, which can be bypassed by prompt injection, hallucination, or inadequate descriptions

See the [Problem Statement](docs/problem-statement.md) for full details.

## Repository Contents

| Document | Description |
| :--- | :--- |
| [Problem Statement](docs/problem-statement.md) | Current limitations and gaps |
| [Use Cases](docs/use-cases.md) | Key use cases driving this work |
| [Approaches](docs/approaches.md) | Approaches being explored (not mutually exclusive) |
| [Open Questions](docs/open-questions.md) | Unresolved questions with community input |
| [Experimental Findings](docs/experimental-findings.md) | Results from implementations and testing |
| [Related Work](docs/related-work.md) | SEPs, implementations, and external resources |
| [Contributing](CONTRIBUTING.md) | How to participate |
| [Python SDK](sdk/python/) | Reference implementation (SEP-1913, 138 tests) |

## Facilitators

| Role | Name | Organization | GitHub |
| :--- | :--- | :--- | :--- |
| TO BE ADDED | TO BE ADDED | TO BE ADDED| TO BE ADDED |
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SamMorrowDrums I will need some help with facilitator names? :)


## Lifecycle

**Current Status: Active Exploration**

### Graduation Criteria (IG → WG)

This IG may propose becoming a Working Group if:

- Clear consensus emerges on an annotation schema requiring sustained spec work
- Cross-cutting coordination requires formal authority delegation
- At least two Core Maintainers sponsor WG formation

### Retirement Criteria

- Problem space resolved (conventions established, absorbed into other WGs)
- Insufficient participation to maintain momentum
- Community consensus that tool annotations don't belong in MCP protocol scope

## Work Tracking

| Item | Status | Champion | Notes |
| :--- | :--- | :--- | :--- |
| Repository scaffolding | Done | All facilitators | Align structure with other experimental-ext repos |
| Problem statement & use cases | Done | All facilitators | Document motivating scenarios and constraints |
| SEP-1913 reference SDK | Done | TBD | Python SDK implementing trust & sensitivity annotations (138 tests passing) |
| Experimental findings | Proposed | TBD | Usability study results and implementation learnings |

## Success Criteria

- **Short-term:** Documented consensus on requirements and evaluation of existing annotation approaches
- **Medium-term:** Clear recommendation (annotation schema convention vs. protocol extension vs. both)
- **Long-term:** Interoperable tool annotation convention across MCP servers and clients

11 changes: 11 additions & 0 deletions docs/approaches.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[Problem Statement](problem-statement.md) |
[Use Cases](use-cases.md) |
Approaches |
[Open Questions](open-questions.md) |
[Experimental Findings](experimental-findings.md) |
[Related Work](related-work.md) |
[Contributing](../CONTRIBUTING.md)

# Approaches Being Explored

*This section will document annotation schema designs, policy enforcement models, and sensitivity propagation strategies being considered by the Interest Group.*
11 changes: 11 additions & 0 deletions docs/experimental-findings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[Problem Statement](problem-statement.md) |
[Use Cases](use-cases.md) |
[Approaches](approaches.md) |
[Open Questions](open-questions.md) |
Experimental Findings |
[Related Work](related-work.md) |
[Contributing](../CONTRIBUTING.md)

# Experimental Findings

*This section will document results from implementations and testing.*
11 changes: 11 additions & 0 deletions docs/open-questions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[Problem Statement](problem-statement.md) |
[Use Cases](use-cases.md) |
[Approaches](approaches.md) |
Open Questions |
[Experimental Findings](experimental-findings.md) |
[Related Work](related-work.md) |
[Contributing](../CONTRIBUTING.md)

# Open Questions

*This section will track unresolved questions requiring community input.*
56 changes: 56 additions & 0 deletions docs/problem-statement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
Problem Statement |
[Use Cases](use-cases.md) |
[Approaches](approaches.md) |
[Open Questions](open-questions.md) |
[Experimental Findings](experimental-findings.md) |
[Related Work](related-work.md) |
[Contributing](../CONTRIBUTING.md)

# Problem Statement

MCP tools today are semantically opaque when it comes to data sensitivity. A tool's definition includes its name, description, and input schema — but nothing about the nature of the data it returns or processes. Consider these two tools:

```jsonc
// Tool 1: System health check
{
"name": "health_check",
"description": "Check if the system is running",
"inputSchema": {}
}

// Tool 2: Patient record lookup
{
"name": "patient_lookup",
"description": "Look up a patient record from the EHR",
"inputSchema": {
"type": "object",
"properties": { "patient_id": { "type": "string" } }
}
}
```

At the protocol level, these tools are indistinguishable in terms of data sensitivity. Both are just callable functions. Yet one returns benign system status, while the other returns Protected Health Information (PHI) regulated under HIPAA law. There is no machine-readable signal that allows a host to differentiate between them.

## Key Gaps

### No Sensitivity Metadata

Tools carry no data classification labels. A host cannot distinguish a tool returning public system metrics from one returning PII, financial records, or credentials. Without classification, all tool outputs are treated equally — which means sensitive data receives no additional protection.

### No Destination Awareness

The protocol does not express where tool outputs may flow. A tool that sends an email to an external address, writes to a third-party API, or stores data in a public bucket looks identical to one that only returns data to the user. Hosts have no structured way to assess data egress risk.

### No Policy Enforcement Surface

Without structured annotations, security policies cannot be evaluated at the protocol level. Organizations cannot express rules like "never send credentials to external destinations" or "require user confirmation before forwarding PII" — because the primitives to describe sensitivity and destination don't exist.

### LLM-Dependent Safety

Data flow decisions currently rely entirely on the LLM's interpretation of tool descriptions. This can be bypassed by:

- **Prompt injection** — Malicious content in tool outputs can instruct the model to misroute data
- **Hallucination** — The model may fabricate a safe interpretation of an ambiguous description
- **Inadequate descriptions** — Tool authors may not describe sensitivity in human-readable text

⚠️ The common thread: the host has no structured metadata to make informed decisions about data flow.
11 changes: 11 additions & 0 deletions docs/related-work.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[Problem Statement](problem-statement.md) |
[Use Cases](use-cases.md) |
[Approaches](approaches.md) |
[Open Questions](open-questions.md) |
[Experimental Findings](experimental-findings.md) |
Related Work |
[Contributing](../CONTRIBUTING.md)

# Related Work

*This section will track SEPs, implementations, and external resources related to tool annotations.*
39 changes: 39 additions & 0 deletions docs/use-cases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
[Problem Statement](problem-statement.md) |
Use Cases |
[Approaches](approaches.md) |
[Open Questions](open-questions.md) |
[Experimental Findings](experimental-findings.md) |
[Related Work](related-work.md) |
[Contributing](../CONTRIBUTING.md)

# Use Cases

## UC-1: Healthcare Data Protection

A healthcare MCP server exposes tools for patient lookup, insurance processing, and staff directory queries. A host needs to enforce HIPAA-compliant data handling: patient data must not be forwarded to external services, and tools returning Protected Health Information (PHI) must be distinguishable from those returning non-sensitive data.

**Requires:** Data classification (`regulated(HIPAA)`), destination constraints, policy enforcement.

## UC-2: Multi-Agent Data Leak Prevention

In a multi-agent architecture, Agent A retrieves sensitive credentials from a vault tool and Agent B has access to an external email tool. Without sensitivity metadata and propagation tracking, there is no protocol-level mechanism to prevent Agent B from exfiltrating credential data it received from Agent A's context.

**Requires:** Sensitivity propagation across tool calls, session-level tracking, policy rules blocking credential egress.

## UC-3: Audit Logging for Compliance

An organization must maintain audit trails showing which tools accessed sensitive data, what classification the data had, and whether policy rules permitted or blocked the action. Current MCP provides no structured metadata to include in audit events.

**Requires:** Structured annotations on tool calls and results, policy decision logging.

## UC-4: User Consent for Sensitive Operations

A tool sends a notification to a patient via SMS. The host should be able to detect — from annotations, not just the tool description — that this tool has an external destination and involves PII, and therefore should prompt the user for confirmation before execution.

**Requires:** Destination metadata (`external`), sensitivity labels, client-side policy evaluation.

## UC-5: Progressive Adoption

Not all MCP server authors need full trust annotations. A simple internal tool may need no annotations at all, while a healthcare integration requires full classification, policy, and audit. The annotation system must support progressive adoption — from zero annotations to full enforcement — without requiring all-or-nothing commitment.

**Requires:** Optional annotations, graceful degradation, layered adoption levels.
Loading