An open-source Python framework for building autonomous AI agents that self-diagnose failures, trigger retraining based on performance decay, and dynamically reconnect to MCP-enabled tools using Model Context Protocol.
Quick Start • Features • Examples • Contributing
AgentForge equips solo AI developers with a self‑healing agent runtime that automatically detects faults, retrains models, and fails over MCP tools without manual intervention. It is ideal for anyone shipping MCP‑enabled servers who wants to reduce operational overhead.
$ agentforge run
[AgentForge] Starting self-healing MCP agent...
Agent ID: agent-001
MCP Endpoint: localhost:8000
Heartbeat interval: 30s
Ready.
Current agent frameworks require manual intervention for failure recovery and version updates, causing downtime and inconsistent behavior in production AI systems.
| Feature | Description |
|---|---|
| Self‑Diagnosis & Failure Detection | Continuously monitors health metrics and emits a fault event within 200 ms of anomaly detection. |
| Adaptive Retraining Trigger | Upon fault acknowledgment, schedules a retraining job within 1 second and persists job state for resume. |
| Dynamic MCP Reconnection | Detects endpoint failure within 500 ms, fails over to a hot‑standby endpoint within 1 second, and transparently replays in‑flight requests. |
| Pluggable Health Probes | Supports custom probes for CPU, latency, error rate, etc., via a simple registration API. |
| Fault Event Bus | Emits structured fault events with trace IDs to standard output for observability. |
| Heartbeat Monitoring | Sends a heartbeat every configurable interval to verify liveness of the agent core. |
- Clone the repository:
git clone https://github.com/m2ai-portfolio/agentforge.git cd agentforge - Install the package in editable mode:
pip install -e . - (Optional) Set environment variables for logging and MCP endpoint:
export AGENTFORGE_LOG_LEVEL=DEBUG export AGENTFORGE_MCP_ENDPOINT=localhost:8000
- Run the agent:
You should see the startup banner and confirmation that the agent is ready.
agentforge run
Basic Agent Startup
Launch the agent with default settings.
$ agentforge run
[AgentForge] Starting self-healing MCP agent...
Agent ID: agent-001
MCP Endpoint: localhost:8000
Heartbeat interval: 30s
Ready.
Trigger Fault Detection Test
Execute the built‑in self‑test to verify fault event emission.
$ agentforge self-heal test-fault-detect
[TEST] Fault detection triggered
[FAULT] TraceID: a1b2c3d4 | Description: Simulated latency spike | Timestamp: 2025-09-16T12:34:56Z
Validate Adaptive Retraining Trigger
Run the orchestrator test to confirm retraining job initiation.
$ agentforge orchestrator test-retrain-trigger
[INFO] Fault acknowledgment received
[INFO] Loading model version 2.3.1 from registry
[RETRAIN] Job started | JobID: rt-20250916-001 | Status: RUNNING
AgentForge: Self-Healing MCP Agent Framework with Built-in Adaptive Orchestration/
agentforge/ # Core source code
__init__.py
cli.py # Command‑line interface entry point
core.py # Agent core and heartbeat logic
self_heal.py # Fault detection and self‑healing routines
orchestrator.py # Retraining job scheduling and persistence
mcp_client.py # MCP connector with hot‑standby failover
models.py # Pydantic data models (AgentConfig, MCPToolSpec)
tests/ # Test suite
test_cli.py
test_core.py
test_self_heal.py
test_orchestrator.py
test_mcp_client.py
assets/
infographic.png # Banner illustration used in README
.gitignore
LICENSE
README.md
pyproject.toml
spec.md
| Technology | Purpose |
|---|---|
| Python 3.11+ | Core language runtime |
| Click | Building the CLI (agentforge command) |
| Pytest | Running unit and integration tests |
Fork the repo, make your changes, run pytest to verify, and submit a pull request.
MIT
Matthew Snow -- M2AI | @m2ai-portfolio
