Skip to content

madhurprash/code_execution_with_mcp

Repository files navigation

Code Execution with MCP

Building AI agents that interact with multiple tools and data sources presents a fundamental challenge: context window constraints. When agents need access to dozens of tools, traditional approaches consume excessive tokens by loading all tool definitions upfront and passing intermediate results through the context window repeatedly.

Anthropic recently published research on code execution with the Model Context Protocol (MCP), demonstrating a 98.7% reduction in context overhead by representing tools as discoverable code rather than verbose JSON schemas. This post presents a practical implementation of these concepts using Amazon Bedrock AgentCore Code Interpreter, showing how filesystem-based tool discovery enables progressive capability loading while achieving similar efficiency gains.

When building AI agents with extensive tool capabilities, we face a fundamental constraint: context windows. Consider an agent that needs access to 22 different tools across GitHub and Slack operations. The traditional approach requires sending complete tool definitions to the model with every request.

Code execution fundamentally changes this dynamic. Instead of sending verbose JSON schemas describing every tool, we can represent tools as Python functions that the agent discovers and imports on demand.

View more here: https://medium.com/@madhur.prashant7/scaling-agents-with-code-execution-and-the-model-context-protocol-a4c263fa7f61?postPublishedType=initial

Overview

This project implements the Code Execution with MCP pattern where an AI agent writes Python code that discovers and uses tools from a filesystem-organized registry. Instead of sending all tool definitions to the model (15,000+ tokens), the agent explores a directory structure and imports only what it needs, achieving 90%+ token reduction.

img

Key Innovation

Traditional Approach: Send all tool definitions → Model calls tools → Return results

  • Token usage: ~50,000 tokens per multi-step workflow
  • Cost: $7.50 per request
  • Latency: 5-10 seconds (multiple round trips)

Code Execution with MCP: List tool names → Model writes code → Execute in sandbox

  • Token usage: ~650 tokens per multi-step workflow
  • Cost: $0.10 per request
  • Latency: ~500ms (single execution)

MCP Registry

The mcp_registry directory contains tools organized by service:

GitHub Tools (14 tools)

  • Repository Operations: create_or_update_file, push_files, create_repository, fork_repository, create_branch
  • Issue & PR Management: create_issue, create_pull_request, list_issues
  • Search: search_repositories, search_code, search_issues, search_users
  • File Operations: get_file_contents, list_repository_files

Slack Tools (8 tools)

  • Channel Operations: list_channels, get_channel_history
  • Messaging: post_message, reply_to_thread, add_reaction
  • Thread Operations: get_thread_replies
  • User Operations: get_users, get_user_profile

See mcp_registry/README.md for detailed usage instructions.

Prerequisites

  • Python 3.11+
  • AWS credentials configured (for Amazon Bedrock)
  • GitHub Personal Access Token
  • Slack Bot Token

Getting Started

1. Install uv and Set Up Python Environment

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

# Create virtual environment and sync dependencies
uv venv && source .venv/bin/activate && uv pip sync pyproject.toml

# Set environment variable for uv
export UV_PROJECT_ENVIRONMENT=.venv

# Install additional dependencies
uv add zmq

# Install Jupyter kernel (if needed)
python -m ipykernel install --user --name=.venv --display-name="Python (uv env)"

2. Install Project Dependencies

Dependencies are already defined in pyproject.toml and will be installed during the sync step above. The project includes:

  • Core: PyGithub, slack-sdk, pydantic, langchain, langchain-aws, bedrock-agentcore
  • Dev: pytest, pytest-asyncio, ruff, mypy

Configuration

3. Create Environment File

Copy the example environment file and fill in your credentials:

cp .env.example .env

4. Set Up GitHub Token

  1. Go to https://github.com/settings/tokens
  2. Click "Generate new token" → "Generate new token (classic)"
  3. Select scopes based on your needs:
    • repo - Full control of private repositories
    • public_repo - Access public repositories
    • write:discussion - Create and manage discussions
  4. Copy the token and add to .env:
    GITHUB_TOKEN=github_pat_YOUR_TOKEN_HERE
    

5. Set Up Slack Bot Token

  1. Go to https://api.slack.com/apps
  2. Create a new app or select existing app
  3. Go to "OAuth & Permissions"
  4. Add bot token scopes:
    • channels:read - View basic channel info
    • channels:history - View messages in public channels
    • chat:write - Send messages
    • users:read - View people in workspace
  5. Install app to workspace and copy "Bot User OAuth Token"
  6. Add to .env:
    SLACK_BOT_TOKEN=xoxb-YOUR-TOKEN-HERE
    

6. AWS Configuration

Ensure AWS credentials are configured:

# Configure AWS CLI
aws configure

# Or set environment variables
export AWS_REGION=us-east-1
export AWS_PROFILE=default

IMPORTANT: Never commit .env file to version control. It's already in .gitignore.

Quick Start

GitHub Example

from github import Github
from mcp_registry.github import create_issue, push_files
import os

# Initialize client
client = Github(os.getenv("GITHUB_TOKEN"))

# Create an issue
result = create_issue(
    client=client,
    owner="username",
    repo="repository",
    title="Bug report",
    body="Description of the bug",
    labels=["bug"]
)

print(f"Created issue: {result['url']}")

Slack Example

from slack_sdk import WebClient
from mcp_registry.slack import post_message, list_channels
import os

# Initialize client
client = WebClient(token=os.getenv("SLACK_BOT_TOKEN"))

# List channels
channels = list_channels(client=client, limit=50)

# Post message
result = post_message(
    client=client,
    channel_id="C1234567890",
    text="Hello from MCP tools!"
)

print(f"Message posted at: {result['timestamp']}")

Testing

Running Tests

The project includes a test suite that can be run using the provided test script or directly with pytest:

# Run all tests using the test script
./run_tests.sh

Testing MCP Integration

There are two main ways to test MCP functionality:

1. Regular MCP Server Testing

Test the MCP server directly using the MCP inspector or client tools:

# Start MCP server (if applicable)
uv run python -m mcp_server

# Test with MCP inspector
npx @modelcontextprotocol/inspector

2. Configuration-Based MCP Testing

Test MCP integration through configuration files:

Update Claude Desktop Config (~/.claude/config.json or similar):

{
  "mcpServers": {
    "code-execution-mcp": {
      "command": "uv",
      "args": ["run", "python", "-m", "mcp_server"],
      "env": {
        "UV_PROJECT_ENVIRONMENT": ".venv"
      }
    }
  }
}

To modify the MCP server configuration:

  1. Edit the config file in ~/.claude/config.json (or your Claude config location)
  2. Update the server name, command, arguments, or environment variables
  3. Restart Claude Desktop to pick up changes
  4. Test the new configuration

Example custom configuration:

{
  "mcpServers": {
    "custom-mcp-name": {
      "command": "uv",
      "args": ["run", "python", "custom_mcp.py"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}",
        "SLACK_BOT_TOKEN": "${SLACK_BOT_TOKEN}",
        "UV_PROJECT_ENVIRONMENT": ".venv"
      }
    }
  }
}

Running Code Directly

You can also run Python code directly for quick testing:

# Run main application
uv run python code_exec_with_mcp_agent.py

Comparing with Regular MCP Approach

To compare the performance and token usage between the code execution approach and the regular MCP approach:

# Run the regular MCP comparison tests
./regular_mcp/run_tests.sh

This will run the same test suite using the traditional MCP approach (sending all tool definitions to the model) and generate comparison results. You can then compare:

  • Token Usage: Regular MCP uses 100,000+ tokens vs Code Execution MCP uses ~27,000 tokens
  • Cost Efficiency: Regular MCP costs ~$0.36/request vs Code Execution MCP costs ~$0.11/request
  • Response Quality: Compare accuracy and completeness of responses

See the examples in the Comparison Results section below for detailed metrics.

Comparison Results

Below are real examples comparing the two approaches on the same task:

Task: Analyze ALL repositories owned by madhurprash and identify the top 5 most starred repositories

Code Execution with MCP Approach

Metrics:

  • Latency: 53.49 seconds
  • Input Tokens: 26,888
  • Output Tokens: 1,888
  • Total Tokens: 28,776
  • Estimated Cost: $0.109

Result Quality:

  • ✅ Correctly analyzed all 83 repositories
  • ✅ Provided exact star counts (6, 5, 3, 3, 3)
  • ✅ Correctly identified primary languages (Jupyter Notebook, Python)
  • ✅ Complete and accurate descriptions
  • ✅ Generated executable Python code for verification

Generated Code: The agent generated Python code that:

  1. Used PyGithub to fetch all repositories
  2. Sorted by star count
  3. Extracted top 5 with complete metadata
  4. Formatted results in both human-readable and JSON formats

Regular MCP Approach

Metrics:

  • Latency: 37.51 seconds
  • Input Tokens: 112,837
  • Output Tokens: 1,241
  • Total Tokens: 114,078
  • Estimated Cost: $0.357

Result Quality:

  • ⚠️ Could not determine exact star counts
  • ⚠️ Had to rely on GitHub API sort order
  • ⚠️ Less precise language identification
  • ⚠️ No executable code generated for verification

Key Observations:

  • The API response didn't include actual star counts
  • Had to make assumptions based on sort order
  • Results were less precise and verifiable

Key Takeaways

Metric Code Execution MCP Regular MCP Improvement
Token Usage 28,776 114,078 74.8% reduction
Cost per Request $0.109 $0.357 69.5% savings
Accuracy Exact star counts Approximate order More precise
Verifiability Generated code No code More transparent
Latency 53.5s 37.5s 42% slower*

* While code execution has slightly higher latency due to code generation and execution, the dramatic cost savings and improved accuracy make it the preferred approach for most use cases.

Why Code Execution MCP is Better

  1. Dramatic Cost Reduction: 69.5% lower cost per request
  2. Better Accuracy: Direct API access provides exact data
  3. Transparency: Generated code can be reviewed and verified
  4. Flexibility: Agent can adapt code based on API responses
  5. Future-Proof: Code can handle API changes dynamically

Design Philosophy

This implementation follows the "Code Execution with MCP" pattern:

  1. Minimal Token Usage: Tools are Python functions, not verbose schemas
  2. Progressive Discovery: Import only what you need, when you need it
  3. Direct Execution: Simple function calls, no middleware
  4. Result Filtering: Structured returns for easy data filtering
  5. Stateless Tools: Each tool is independent and composable

Token Optimization

Traditional approach:

Load all 22 tool schemas � 50,000+ tokens

MCP approach:

List available tools � 650 tokens
Import specific tool � Execute directly

This represents a 98.7% reduction in token usage.

Usage

Running the Agent

# Interactive mode
uv run python code_exec_with_mcp_agent.py

# Single query mode
uv run python code_exec_with_mcp_agent.py --query "List all GitHub repositories for user anthropics"

# With custom model
uv run python code_exec_with_mcp_agent.py --model anthropic.claude-haiku-4-5-20251001-v1:0

# Enable debug logging
uv run python code_exec_with_mcp_agent.py --debug

The agent will use the filesystem tools to progressively discover MCP tools and execute code to accomplish your tasks.

Integration with Amazon Bedrock AgentCore

These tools are designed to work with Amazon Bedrock AgentCore Code Interpreter for secure, sandboxed execution:

# In AgentCore sandbox
from mcp_registry.github import create_issue

# Execute tool
result = create_issue(...)

# Filter result before returning to model
return {
    "issue_number": result["issue_number"],
    "url": result["url"]
}

Implementation Status

✅ Completed

  • MCP Registry with 22 tools (GitHub: 14, Slack: 8)
  • Filesystem-based tool organization
  • Type-safe implementations with comprehensive docs

🚧 In Progress

  • Amazon Bedrock integration (Claude Sonnet 4.5)
  • AgentCore Code Interpreter integration
  • REST API server
  • Deployment automation

See IMPLEMENTATION_DESIGN.md for the complete roadmap.

References

About

Efficient code execution with MCP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published