Building AI agents that interact with multiple tools and data sources presents a fundamental challenge: context window constraints. When agents need access to dozens of tools, traditional approaches consume excessive tokens by loading all tool definitions upfront and passing intermediate results through the context window repeatedly.
Anthropic recently published research on code execution with the Model Context Protocol (MCP), demonstrating a 98.7% reduction in context overhead by representing tools as discoverable code rather than verbose JSON schemas. This post presents a practical implementation of these concepts using Amazon Bedrock AgentCore Code Interpreter, showing how filesystem-based tool discovery enables progressive capability loading while achieving similar efficiency gains.
When building AI agents with extensive tool capabilities, we face a fundamental constraint: context windows. Consider an agent that needs access to 22 different tools across GitHub and Slack operations. The traditional approach requires sending complete tool definitions to the model with every request.
Code execution fundamentally changes this dynamic. Instead of sending verbose JSON schemas describing every tool, we can represent tools as Python functions that the agent discovers and imports on demand.
View more here: https://medium.com/@madhur.prashant7/scaling-agents-with-code-execution-and-the-model-context-protocol-a4c263fa7f61?postPublishedType=initial
This project implements the Code Execution with MCP pattern where an AI agent writes Python code that discovers and uses tools from a filesystem-organized registry. Instead of sending all tool definitions to the model (15,000+ tokens), the agent explores a directory structure and imports only what it needs, achieving 90%+ token reduction.
Traditional Approach: Send all tool definitions → Model calls tools → Return results
- Token usage: ~50,000 tokens per multi-step workflow
- Cost: $7.50 per request
- Latency: 5-10 seconds (multiple round trips)
Code Execution with MCP: List tool names → Model writes code → Execute in sandbox
- Token usage: ~650 tokens per multi-step workflow
- Cost: $0.10 per request
- Latency: ~500ms (single execution)
The mcp_registry directory contains tools organized by service:
- Repository Operations: create_or_update_file, push_files, create_repository, fork_repository, create_branch
- Issue & PR Management: create_issue, create_pull_request, list_issues
- Search: search_repositories, search_code, search_issues, search_users
- File Operations: get_file_contents, list_repository_files
- Channel Operations: list_channels, get_channel_history
- Messaging: post_message, reply_to_thread, add_reaction
- Thread Operations: get_thread_replies
- User Operations: get_users, get_user_profile
See mcp_registry/README.md for detailed usage instructions.
- Python 3.11+
- AWS credentials configured (for Amazon Bedrock)
- GitHub Personal Access Token
- Slack Bot Token
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
# Create virtual environment and sync dependencies
uv venv && source .venv/bin/activate && uv pip sync pyproject.toml
# Set environment variable for uv
export UV_PROJECT_ENVIRONMENT=.venv
# Install additional dependencies
uv add zmq
# Install Jupyter kernel (if needed)
python -m ipykernel install --user --name=.venv --display-name="Python (uv env)"Dependencies are already defined in pyproject.toml and will be installed during the sync step above. The project includes:
- Core: PyGithub, slack-sdk, pydantic, langchain, langchain-aws, bedrock-agentcore
- Dev: pytest, pytest-asyncio, ruff, mypy
Copy the example environment file and fill in your credentials:
cp .env.example .env- Go to https://github.com/settings/tokens
- Click "Generate new token" → "Generate new token (classic)"
- Select scopes based on your needs:
repo- Full control of private repositoriespublic_repo- Access public repositorieswrite:discussion- Create and manage discussions
- Copy the token and add to
.env:GITHUB_TOKEN=github_pat_YOUR_TOKEN_HERE
- Go to https://api.slack.com/apps
- Create a new app or select existing app
- Go to "OAuth & Permissions"
- Add bot token scopes:
channels:read- View basic channel infochannels:history- View messages in public channelschat:write- Send messagesusers:read- View people in workspace
- Install app to workspace and copy "Bot User OAuth Token"
- Add to
.env:SLACK_BOT_TOKEN=xoxb-YOUR-TOKEN-HERE
Ensure AWS credentials are configured:
# Configure AWS CLI
aws configure
# Or set environment variables
export AWS_REGION=us-east-1
export AWS_PROFILE=defaultIMPORTANT: Never commit .env file to version control. It's already in .gitignore.
from github import Github
from mcp_registry.github import create_issue, push_files
import os
# Initialize client
client = Github(os.getenv("GITHUB_TOKEN"))
# Create an issue
result = create_issue(
client=client,
owner="username",
repo="repository",
title="Bug report",
body="Description of the bug",
labels=["bug"]
)
print(f"Created issue: {result['url']}")from slack_sdk import WebClient
from mcp_registry.slack import post_message, list_channels
import os
# Initialize client
client = WebClient(token=os.getenv("SLACK_BOT_TOKEN"))
# List channels
channels = list_channels(client=client, limit=50)
# Post message
result = post_message(
client=client,
channel_id="C1234567890",
text="Hello from MCP tools!"
)
print(f"Message posted at: {result['timestamp']}")The project includes a test suite that can be run using the provided test script or directly with pytest:
# Run all tests using the test script
./run_tests.shThere are two main ways to test MCP functionality:
Test the MCP server directly using the MCP inspector or client tools:
# Start MCP server (if applicable)
uv run python -m mcp_server
# Test with MCP inspector
npx @modelcontextprotocol/inspectorTest MCP integration through configuration files:
Update Claude Desktop Config (~/.claude/config.json or similar):
{
"mcpServers": {
"code-execution-mcp": {
"command": "uv",
"args": ["run", "python", "-m", "mcp_server"],
"env": {
"UV_PROJECT_ENVIRONMENT": ".venv"
}
}
}
}To modify the MCP server configuration:
- Edit the config file in
~/.claude/config.json(or your Claude config location) - Update the server name, command, arguments, or environment variables
- Restart Claude Desktop to pick up changes
- Test the new configuration
Example custom configuration:
{
"mcpServers": {
"custom-mcp-name": {
"command": "uv",
"args": ["run", "python", "custom_mcp.py"],
"env": {
"GITHUB_TOKEN": "${GITHUB_TOKEN}",
"SLACK_BOT_TOKEN": "${SLACK_BOT_TOKEN}",
"UV_PROJECT_ENVIRONMENT": ".venv"
}
}
}
}You can also run Python code directly for quick testing:
# Run main application
uv run python code_exec_with_mcp_agent.pyTo compare the performance and token usage between the code execution approach and the regular MCP approach:
# Run the regular MCP comparison tests
./regular_mcp/run_tests.shThis will run the same test suite using the traditional MCP approach (sending all tool definitions to the model) and generate comparison results. You can then compare:
- Token Usage: Regular MCP uses 100,000+ tokens vs Code Execution MCP uses ~27,000 tokens
- Cost Efficiency: Regular MCP costs ~$0.36/request vs Code Execution MCP costs ~$0.11/request
- Response Quality: Compare accuracy and completeness of responses
See the examples in the Comparison Results section below for detailed metrics.
Below are real examples comparing the two approaches on the same task:
Task: Analyze ALL repositories owned by madhurprash and identify the top 5 most starred repositories
Metrics:
- Latency: 53.49 seconds
- Input Tokens: 26,888
- Output Tokens: 1,888
- Total Tokens: 28,776
- Estimated Cost: $0.109
Result Quality:
- ✅ Correctly analyzed all 83 repositories
- ✅ Provided exact star counts (6, 5, 3, 3, 3)
- ✅ Correctly identified primary languages (Jupyter Notebook, Python)
- ✅ Complete and accurate descriptions
- ✅ Generated executable Python code for verification
Generated Code: The agent generated Python code that:
- Used PyGithub to fetch all repositories
- Sorted by star count
- Extracted top 5 with complete metadata
- Formatted results in both human-readable and JSON formats
Metrics:
- Latency: 37.51 seconds
- Input Tokens: 112,837
- Output Tokens: 1,241
- Total Tokens: 114,078
- Estimated Cost: $0.357
Result Quality:
⚠️ Could not determine exact star counts⚠️ Had to rely on GitHub API sort order⚠️ Less precise language identification⚠️ No executable code generated for verification
Key Observations:
- The API response didn't include actual star counts
- Had to make assumptions based on sort order
- Results were less precise and verifiable
| Metric | Code Execution MCP | Regular MCP | Improvement |
|---|---|---|---|
| Token Usage | 28,776 | 114,078 | 74.8% reduction |
| Cost per Request | $0.109 | $0.357 | 69.5% savings |
| Accuracy | Exact star counts | Approximate order | More precise |
| Verifiability | Generated code | No code | More transparent |
| Latency | 53.5s | 37.5s | 42% slower* |
* While code execution has slightly higher latency due to code generation and execution, the dramatic cost savings and improved accuracy make it the preferred approach for most use cases.
- Dramatic Cost Reduction: 69.5% lower cost per request
- Better Accuracy: Direct API access provides exact data
- Transparency: Generated code can be reviewed and verified
- Flexibility: Agent can adapt code based on API responses
- Future-Proof: Code can handle API changes dynamically
This implementation follows the "Code Execution with MCP" pattern:
- Minimal Token Usage: Tools are Python functions, not verbose schemas
- Progressive Discovery: Import only what you need, when you need it
- Direct Execution: Simple function calls, no middleware
- Result Filtering: Structured returns for easy data filtering
- Stateless Tools: Each tool is independent and composable
Traditional approach:
Load all 22 tool schemas � 50,000+ tokens
MCP approach:
List available tools � 650 tokens
Import specific tool � Execute directly
This represents a 98.7% reduction in token usage.
# Interactive mode
uv run python code_exec_with_mcp_agent.py
# Single query mode
uv run python code_exec_with_mcp_agent.py --query "List all GitHub repositories for user anthropics"
# With custom model
uv run python code_exec_with_mcp_agent.py --model anthropic.claude-haiku-4-5-20251001-v1:0
# Enable debug logging
uv run python code_exec_with_mcp_agent.py --debugThe agent will use the filesystem tools to progressively discover MCP tools and execute code to accomplish your tasks.
These tools are designed to work with Amazon Bedrock AgentCore Code Interpreter for secure, sandboxed execution:
# In AgentCore sandbox
from mcp_registry.github import create_issue
# Execute tool
result = create_issue(...)
# Filter result before returning to model
return {
"issue_number": result["issue_number"],
"url": result["url"]
}- MCP Registry with 22 tools (GitHub: 14, Slack: 8)
- Filesystem-based tool organization
- Type-safe implementations with comprehensive docs
- Amazon Bedrock integration (Claude Sonnet 4.5)
- AgentCore Code Interpreter integration
- REST API server
- Deployment automation
See IMPLEMENTATION_DESIGN.md for the complete roadmap.
