Code Execution with MCP

Building AI agents that interact with multiple tools and data sources presents a fundamental challenge: context window constraints. When agents need access to dozens of tools, traditional approaches consume excessive tokens by loading all tool definitions upfront and passing intermediate results through the context window repeatedly.

Anthropic recently published research on code execution with the Model Context Protocol (MCP), demonstrating a 98.7% reduction in context overhead by representing tools as discoverable code rather than verbose JSON schemas. This post presents a practical implementation of these concepts using Amazon Bedrock AgentCore Code Interpreter, showing how filesystem-based tool discovery enables progressive capability loading while achieving similar efficiency gains.

When building AI agents with extensive tool capabilities, we face a fundamental constraint: context windows. Consider an agent that needs access to 22 different tools across GitHub and Slack operations. The traditional approach requires sending complete tool definitions to the model with every request.

Code execution fundamentally changes this dynamic. Instead of sending verbose JSON schemas describing every tool, we can represent tools as Python functions that the agent discovers and imports on demand.

View more here: https://medium.com/@madhur.prashant7/scaling-agents-with-code-execution-and-the-model-context-protocol-a4c263fa7f61?postPublishedType=initial

Overview

This project implements the Code Execution with MCP pattern where an AI agent writes Python code that discovers and uses tools from a filesystem-organized registry. Instead of sending all tool definitions to the model (15,000+ tokens), the agent explores a directory structure and imports only what it needs, achieving 90%+ token reduction.

Key Innovation

Traditional Approach: Send all tool definitions → Model calls tools → Return results

Token usage: ~50,000 tokens per multi-step workflow
Cost: $7.50 per request
Latency: 5-10 seconds (multiple round trips)

Code Execution with MCP: List tool names → Model writes code → Execute in sandbox

Token usage: ~650 tokens per multi-step workflow
Cost: $0.10 per request
Latency: ~500ms (single execution)

MCP Registry

The mcp_registry directory contains tools organized by service:

GitHub Tools (14 tools)

Repository Operations: create_or_update_file, push_files, create_repository, fork_repository, create_branch
Issue & PR Management: create_issue, create_pull_request, list_issues
Search: search_repositories, search_code, search_issues, search_users
File Operations: get_file_contents, list_repository_files

Slack Tools (8 tools)

Channel Operations: list_channels, get_channel_history
Messaging: post_message, reply_to_thread, add_reaction
Thread Operations: get_thread_replies
User Operations: get_users, get_user_profile

See mcp_registry/README.md for detailed usage instructions.

Prerequisites

Python 3.11+
AWS credentials configured (for Amazon Bedrock)
GitHub Personal Access Token
Slack Bot Token

Getting Started

1. Install uv and Set Up Python Environment

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

# Create virtual environment and sync dependencies
uv venv && source .venv/bin/activate && uv pip sync pyproject.toml

# Set environment variable for uv
export UV_PROJECT_ENVIRONMENT=.venv

# Install additional dependencies
uv add zmq

# Install Jupyter kernel (if needed)
python -m ipykernel install --user --name=.venv --display-name="Python (uv env)"

2. Install Project Dependencies

Dependencies are already defined in pyproject.toml and will be installed during the sync step above. The project includes:

Core: PyGithub, slack-sdk, pydantic, langchain, langchain-aws, bedrock-agentcore
Dev: pytest, pytest-asyncio, ruff, mypy

Configuration

3. Create Environment File

Copy the example environment file and fill in your credentials:

cp .env.example .env

4. Set Up GitHub Token

Go to https://github.com/settings/tokens
Click "Generate new token" → "Generate new token (classic)"
Select scopes based on your needs:
- repo - Full control of private repositories
- public_repo - Access public repositories
- write:discussion - Create and manage discussions

Copy the token and add to .env:

GITHUB_TOKEN=github_pat_YOUR_TOKEN_HERE

5. Set Up Slack Bot Token

Go to https://api.slack.com/apps
Create a new app or select existing app
Go to "OAuth & Permissions"
Add bot token scopes:
- channels:read - View basic channel info
- channels:history - View messages in public channels
- chat:write - Send messages
- users:read - View people in workspace
Install app to workspace and copy "Bot User OAuth Token"
Add to .env:
```
SLACK_BOT_TOKEN=xoxb-YOUR-TOKEN-HERE
```

6. AWS Configuration

Ensure AWS credentials are configured:

# Configure AWS CLI
aws configure

# Or set environment variables
export AWS_REGION=us-east-1
export AWS_PROFILE=default

IMPORTANT: Never commit .env file to version control. It's already in .gitignore.

Quick Start

GitHub Example

from github import Github
from mcp_registry.github import create_issue, push_files
import os

# Initialize client
client = Github(os.getenv("GITHUB_TOKEN"))

# Create an issue
result = create_issue(
    client=client,
    owner="username",
    repo="repository",
    title="Bug report",
    body="Description of the bug",
    labels=["bug"]
)

print(f"Created issue: {result['url']}")

Slack Example

from slack_sdk import WebClient
from mcp_registry.slack import post_message, list_channels
import os

# Initialize client
client = WebClient(token=os.getenv("SLACK_BOT_TOKEN"))

# List channels
channels = list_channels(client=client, limit=50)

# Post message
result = post_message(
    client=client,
    channel_id="C1234567890",
    text="Hello from MCP tools!"
)

print(f"Message posted at: {result['timestamp']}")

Testing

Running Tests

The project includes a test suite that can be run using the provided test script or directly with pytest:

# Run all tests using the test script
./run_tests.sh

Testing MCP Integration

There are two main ways to test MCP functionality:

1. Regular MCP Server Testing

Test the MCP server directly using the MCP inspector or client tools:

# Start MCP server (if applicable)
uv run python -m mcp_server

# Test with MCP inspector
npx @modelcontextprotocol/inspector

2. Configuration-Based MCP Testing

Test MCP integration through configuration files:

Update Claude Desktop Config (~/.claude/config.json or similar):

{
  "mcpServers": {
    "code-execution-mcp": {
      "command": "uv",
      "args": ["run", "python", "-m", "mcp_server"],
      "env": {
        "UV_PROJECT_ENVIRONMENT": ".venv"
      }
    }
  }
}

To modify the MCP server configuration:

Edit the config file in ~/.claude/config.json (or your Claude config location)
Update the server name, command, arguments, or environment variables
Restart Claude Desktop to pick up changes
Test the new configuration

Example custom configuration:

{
  "mcpServers": {
    "custom-mcp-name": {
      "command": "uv",
      "args": ["run", "python", "custom_mcp.py"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}",
        "SLACK_BOT_TOKEN": "${SLACK_BOT_TOKEN}",
        "UV_PROJECT_ENVIRONMENT": ".venv"
      }
    }
  }
}

Running Code Directly

You can also run Python code directly for quick testing:

# Run main application
uv run python code_exec_with_mcp_agent.py

Comparing with Regular MCP Approach

To compare the performance and token usage between the code execution approach and the regular MCP approach:

# Run the regular MCP comparison tests
./regular_mcp/run_tests.sh

This will run the same test suite using the traditional MCP approach (sending all tool definitions to the model) and generate comparison results. You can then compare:

Token Usage: Regular MCP uses 100,000+ tokens vs Code Execution MCP uses ~27,000 tokens
Cost Efficiency: Regular MCP costs ~$0.36/request vs Code Execution MCP costs ~$0.11/request
Response Quality: Compare accuracy and completeness of responses

See the examples in the Comparison Results section below for detailed metrics.

Comparison Results

Below are real examples comparing the two approaches on the same task:

Task: Analyze ALL repositories owned by madhurprash and identify the top 5 most starred repositories

Code Execution with MCP Approach

Metrics:

Latency: 53.49 seconds
Input Tokens: 26,888
Output Tokens: 1,888
Total Tokens: 28,776
Estimated Cost: $0.109

Result Quality:

✅ Correctly analyzed all 83 repositories
✅ Provided exact star counts (6, 5, 3, 3, 3)
✅ Correctly identified primary languages (Jupyter Notebook, Python)
✅ Complete and accurate descriptions
✅ Generated executable Python code for verification

Generated Code: The agent generated Python code that:

Used PyGithub to fetch all repositories
Sorted by star count
Extracted top 5 with complete metadata
Formatted results in both human-readable and JSON formats

Regular MCP Approach

Metrics:

Latency: 37.51 seconds
Input Tokens: 112,837
Output Tokens: 1,241
Total Tokens: 114,078
Estimated Cost: $0.357

Result Quality:

⚠️ Could not determine exact star counts
⚠️ Had to rely on GitHub API sort order
⚠️ Less precise language identification
⚠️ No executable code generated for verification

Key Observations:

The API response didn't include actual star counts
Had to make assumptions based on sort order
Results were less precise and verifiable

Key Takeaways

Metric	Code Execution MCP	Regular MCP	Improvement
Token Usage	28,776	114,078	74.8% reduction
Cost per Request	$0.109	$0.357	69.5% savings
Accuracy	Exact star counts	Approximate order	More precise
Verifiability	Generated code	No code	More transparent
Latency	53.5s	37.5s	42% slower*

* While code execution has slightly higher latency due to code generation and execution, the dramatic cost savings and improved accuracy make it the preferred approach for most use cases.

Why Code Execution MCP is Better

Dramatic Cost Reduction: 69.5% lower cost per request
Better Accuracy: Direct API access provides exact data
Transparency: Generated code can be reviewed and verified
Flexibility: Agent can adapt code based on API responses
Future-Proof: Code can handle API changes dynamically

Design Philosophy

This implementation follows the "Code Execution with MCP" pattern:

Minimal Token Usage: Tools are Python functions, not verbose schemas
Progressive Discovery: Import only what you need, when you need it
Direct Execution: Simple function calls, no middleware
Result Filtering: Structured returns for easy data filtering
Stateless Tools: Each tool is independent and composable

Token Optimization

Traditional approach:

Load all 22 tool schemas � 50,000+ tokens

MCP approach:

List available tools � 650 tokens
Import specific tool � Execute directly

This represents a 98.7% reduction in token usage.

Usage

Running the Agent

# Interactive mode
uv run python code_exec_with_mcp_agent.py

# Single query mode
uv run python code_exec_with_mcp_agent.py --query "List all GitHub repositories for user anthropics"

# With custom model
uv run python code_exec_with_mcp_agent.py --model anthropic.claude-haiku-4-5-20251001-v1:0

# Enable debug logging
uv run python code_exec_with_mcp_agent.py --debug

The agent will use the filesystem tools to progressively discover MCP tools and execute code to accomplish your tasks.

Integration with Amazon Bedrock AgentCore

These tools are designed to work with Amazon Bedrock AgentCore Code Interpreter for secure, sandboxed execution:

# In AgentCore sandbox
from mcp_registry.github import create_issue

# Execute tool
result = create_issue(...)

# Filter result before returning to model
return {
    "issue_number": result["issue_number"],
    "url": result["url"]
}

Implementation Status

✅ Completed

MCP Registry with 22 tools (GitHub: 14, Slack: 8)
Filesystem-based tool organization
Type-safe implementations with comprehensive docs

🚧 In Progress

Amazon Bedrock integration (Claude Sonnet 4.5)
AgentCore Code Interpreter integration
REST API server
Deployment automation

See IMPLEMENTATION_DESIGN.md for the complete roadmap.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
blog		blog
examples		examples
img		img
mcp_registry		mcp_registry
regular_mcp		regular_mcp
system_prompts		system_prompts
test_results		test_results
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
code_exec_with_mcp_agent.py		code_exec_with_mcp_agent.py
config.yaml		config.yaml
constants.py		constants.py
model_pricing.yaml		model_pricing.yaml
pyproject.toml		pyproject.toml
run_test_runner.py		run_test_runner.py
run_tests.sh		run_tests.sh
synthetic_test_dataset.json		synthetic_test_dataset.json
utils.py		utils.py
uv.lock		uv.lock

madhurprash/code_execution_with_mcp

Folders and files

Latest commit

History

Repository files navigation