A speed-first Rust proxy that sits between your AI coding harness and your LLMs. A local 8B classifier (Bonsai) decides in under 200 ms whether each request goes to cloud (via Manifest) or local inference (via llama-swap). Automatic fallback, system-prompt rewriting for local models, and an MCP-triggered iterative code-review loop that runs entirely on your own hardware.
coding harness (omp / claude / vibe / opencode / codex / droid)
│
▼
brainrouter :9099
│
┌─────┼──────────────────────────────────────────────────┐
│ ├─ model=auto → Bonsai classifies query │
│ │ Cloud ──── Manifest :3001 │
│ │ Local ──── llama-swap :8081 │
│ ├─ model=local → rewrite prompt → llama-swap │
│ └─ model=cloud → Manifest (direct) │
└─────────────────────────────────────────────────────────┘
│
on Manifest fail ───── llama-swap fallback_model
│
on task complete ────── review loop → local LLM → dashboard
- One endpoint, all harnesses. OpenAI-compatible on
POST /v1/chat/completions. Anthropic-compatible onPOST /v1/messages. Every harness connects to the same:9099. - Three routing modes.
autouses Bonsai classification (<200 ms).localrewrites the system prompt and goes straight to llama-swap.cloudgoes straight to Manifest. - Local prompt rewriting. OMP's 15–20 K token system prompt overwhelms small local models. Local mode replaces it with a lean ~500 token prompt with anti-loop directives.
- Manifest handles cloud failover. Manifest runs locally in Docker and picks the right cloud provider (Anthropic, OpenAI, Copilot, Google, Mistral, DeepSeek, etc.) with its own automatic fallbacks.
- MCP code review.
mcp_brainrouter_request_reviewtriggers an iterative review loop (up to 5 rounds by default). The review LLM reads your PRD, git diff, and task summary, then either approves or gives actionable feedback. - Dashboard. Live routing feed, review session list, version display, one-click upgrades and service restarts — all at
http://127.0.0.1:9099.
- Install (one script)
- Configure
- Connect your harness
- Dashboard guide
- MCP code review guide
- Bridge: Discord and Signal
- Reference
For Fedora Linux with multiple users, one script installs and configures everything. Run it as a user with sudo access:
git clone https://github.com/ajaxdude/brainrouter ~/ai/projects/brainrouter
cd ~/ai/projects/brainrouter
sudo bash install.shThe script installs (idempotent — safe to re-run):
- System packages — git, golang, toolbox, docker, vulkan headers
- bun — JavaScript runtime for oh-my-pi, installed system-wide
- oh-my-pi — installed for every human user via bun
- Bonsai Q4_K_M — downloaded to
/opt/models/bonsai/(~5.2 GB) - Manifest — cloud LLM router running as a system Docker service on port 3001
- llama-swap — local model runner as a system Docker service on port 8081
- brainrouter — compiled and installed to
/usr/local/bin/brainrouter - llama-server-toolbox — wrapper at
/usr/local/bin/llama-server-toolbox - toolbox container
llama-vulkan-radv— AMD RADV Vulkan environment - Shared config —
/etc/brainrouter/brainrouter.yamland/etc/brainrouter/env - Per-user systemd services — brainrouter enabled for every user, auto-starts at boot via
loginctl linger - Shell environment —
/etc/profile.d/ai-stack.shsets PATH and harness env vars for all users
The Manifest API key cannot be automated (you create it in the browser wizard):
- Open http://localhost:3001, complete the setup wizard, add your cloud API keys
- Go to Settings → API Keys → Create key — copy the
mnfst_…key - Paste it into the shared env file:
sudo nano /etc/brainrouter/env # Replace: MANIFEST_API_KEY=mnfst_REPLACE_WITH_YOUR_KEY - Reboot — all users come up with brainrouter running automatically.
Or without rebooting, for each user:
sudo -u USERNAME XDG_RUNTIME_DIR=/run/user/$(id -u USERNAME) \ systemctl --user restart brainrouter
- The Manifest API key lives in
/etc/brainrouter/env(ownedroot:aistack, mode640). All users in theaistackgroup can read it. The script adds every human user to this group. loginctl enable-lingeris set for each user so brainrouter starts at boot without anyone needing to log in.- New users added after install: their service file comes from
/etc/skel; runsudo bash install.shagain (idempotent) to complete their setup. - To edit which local model llama-swap serves:
sudo nano /opt/ai/llama-swap/config.yaml sudo systemctl restart llama-swap
After install.sh runs, the system config is already in place at /etc/brainrouter/brainrouter.yaml.
Each user also gets a copy seeded to ~/.config/brainrouter/brainrouter.yaml at install time.
The only value you need to change post-install is fallback_model — set it to match a model key
in /opt/ai/llama-swap/config.yaml:
sudo nano /etc/brainrouter/brainrouter.yaml
sudo nano /opt/ai/llama-swap/config.yaml # define the model
sudo systemctl restart llama-swap# /etc/brainrouter/brainrouter.yaml (shared for all users)
manifest:
base_url: "http://localhost:3001/v1"
api_key_env: MANIFEST_API_KEY # key lives in /etc/brainrouter/env
llama_swap:
base_url: "http://localhost:8081/v1"
fallback_model: "your-local-model" # must match a key in /opt/ai/llama-swap/config.yaml
bonsai:
model_path: "/opt/models/bonsai/prism-ml_Bonsai-8B-unpacked-Q4_K_M.gguf"The Manifest API key lives in /etc/brainrouter/env (readable by the aistack group — all human
users are added to it by install.sh):
sudo nano /etc/brainrouter/env
# MANIFEST_API_KEY=mnfst_your_key_hereAfter any config change, restart brainrouter for your user:
systemctl --user restart brainrouterbrainrouter includes an install subcommand that patches your harness config automatically.
# Auto-install (patches config files in place, asks for confirmation):
./target/release/brainrouter install omp
./target/release/brainrouter install claude --shell-rc
./target/release/brainrouter install vibe
./target/release/brainrouter install opencode
./target/release/brainrouter install codex
./target/release/brainrouter install droid
# Skip confirmation prompt:
./target/release/brainrouter install omp --yes# ~/.omp/agent/models.yml — add under providers:
providers:
brainrouter:
baseUrl: http://127.0.0.1:9099/v1
api: openai-completions
auth: none
models:
- id: auto
name: Brainrouter (auto)
- id: local
name: Brainrouter (local)
- id: cloud
name: Brainrouter (cloud)// ~/.omp/agent/mcp.json
{
"mcpServers": {
"brainrouter": {
"type": "stdio",
"command": "/home/yourname/ai/projects/brainrouter/target/release/brainrouter",
"args": ["mcp", "--socket", "/run/user/$(id -u)/brainrouter.sock"],
"timeout": 300000
}
}
}# Register MCP tool:
brainrouter install claude --shell-rc
# Or manually:
claude mcp add-json brainrouter '{
"type": "stdio",
"command": "/path/to/brainrouter",
"args": ["mcp", "--socket", "/run/user/$(id -u)/brainrouter.sock"]
}' --scope user
# Route Claude Code through brainrouter (add to ~/.zshrc):
export ANTHROPIC_BASE_URL=http://127.0.0.1:9099
export ANTHROPIC_AUTH_TOKEN=not-used# Append to ~/.vibe/config.toml
[[providers]]
name = "brainrouter"
api_base = "http://127.0.0.1:9099/v1"
api_style = "openai"
backend = "generic"
[[models]]
name = "brainrouter-auto"
provider = "brainrouter"
alias = "auto"
mcp_servers = [
{ name = "brainrouter", command = "/path/to/brainrouter", args = ["mcp", "--socket", "/run/user/$(id -u)/brainrouter.sock"] },
]// Merge into ~/.config/opencode/config.json
{
"provider": {
"brainrouter": {
"npm": "@ai-sdk/openai-compatible",
"name": "Brainrouter",
"options": { "baseURL": "http://127.0.0.1:9099/v1" },
"models": { "auto": { "model": "auto", "name": "Brainrouter (auto)" } }
}
},
"mcp": {
"brainrouter": {
"type": "local",
"command": ["/path/to/brainrouter", "mcp", "--socket", "/run/user/$(id -u)/brainrouter.sock"]
}
}
}# ~/.codex/config.toml
model = "auto"
model_provider = "brainrouter"
[model_providers.brainrouter]
name = "Brainrouter"
base_url = "http://127.0.0.1:9099/v1"
[mcp_servers.brainrouter]
command = "/path/to/brainrouter"
args = ["mcp", "--socket", "/run/user/$(id -u)/brainrouter.sock"]// ~/.factory/mcp.json
{
"custom_models": [{
"model": "brainrouter-auto",
"base_url": "http://127.0.0.1:9099/v1",
"api_key": "not-used",
"provider": "anthropic"
}],
"mcpServers": {
"brainrouter": {
"type": "stdio",
"command": "/path/to/brainrouter",
"args": ["mcp", "--socket", "/run/user/$(id -u)/brainrouter.sock"]
}
}
}Note:
provider: "anthropic"is required for droid. Droid'sopenaimode posts to/responses(not served here).anthropicmode posts to/v1/messages, which brainrouter handles.
For shared machines with multiple users, see deploy/brainrouter_ecosystem.md which covers:
- System-level services: llama-swap and Manifest run as system Docker services, shared by all users.
- Per-user brainrouter: Each user runs their own brainrouter instance as a systemd user service.
- Shared model storage: GGUFs in
/opt/modelswithaistackgroup permissions. - Automated scripts:
deploy/deploy.sh --multi-userfor the admin,deploy/user-setup.shfor each user. - Uninstall:
deploy/uninstall.shanddeploy/uninstall_brainrouter_ecosystem.md.
Open http://127.0.0.1:9099 in a browser. The dashboard auto-refreshes every 3 seconds.
The top panel shows the most recent request as it moves through the pipeline:
harness → Bonsai classify → [Cloud: Manifest] or [Local: llama-swap] → response
Each stage shows:
- Bonsai decision —
cloudorlocalbadge - Provider — which upstream handled it
- Model — the model key that was used
- Latency — end-to-end time in ms
- Fallback indicator ↩ — appears when Manifest failed and llama-swap handled it instead
The table below the flow panel shows the last 50 routing events, deduplicated:
- Identical requests within a 30-second window are collapsed into a single row with a
×Nbadge and cumulative latency. - Review iterations within the same session collapse into one row with an
iter Nbadge. - Hover the Prompt cell to see the full prompt excerpt.
- The Folder badge shows which project directory the request came from.
The header row shows current installed versions of:
- llama-swap — the local model router binary
- llama.cpp — the llama-server build inside the toolbox container
- Manifest — the running Docker container (image date · short hash)
- toolbox — the OCI image version label
When a newer version is available (checked against GitHub / Docker Hub on each poll), an orange component → new-version button appears. Click it to upgrade. Each button is labelled so you know exactly what will be updated.
Four restart buttons in the top nav:
| Button | What it does |
|---|---|
| Restart llama-swap | systemctl --user restart llama-swap |
| Restart llama.cpp | Refreshes the toolbox container (runs configured restart script) |
| Restart Manifest | docker compose restart manifest |
| Restart brainrouter | systemctl --user restart brainrouter — page reloads after 3 s |
Click Review Sessions in the nav to see the session list:
- Each row shows task ID, status (
pending/approved/needs_revision/escalated), iteration count, reviewer type (LLM or human), and timestamps. - Click a row to open the session detail view with the full conversation history.
- If a session is
escalated(LLM couldn't resolve it after max iterations), a Resolve panel appears — type your feedback and submit to close the loop.
A collapsible panel on the dashboard lets you control how code reviews run:
| Setting | Options | Effect |
|---|---|---|
| Review mode | Auto / Force Cloud / Force Local | Auto lets Bonsai decide; Force overrides for all reviews in this session |
| Local model | dropdown of llama-swap models | When forcing local, which specific model to use |
Changes take effect immediately for new review requests. The setting persists across daemon restarts.
The review tool is exposed over MCP so any harness can call it after completing a task.
- Your harness calls
mcp_brainrouter_request_reviewwith a task ID and summary. - brainrouter gathers context: your project's PRD (auto-detected from
docs/PRD.md,PRD.md, orREADME.md), the currentgit diff HEAD, and anyAGENTS.md. - A review prompt is assembled and sent to the configured LLM (cloud or local, depending on review mode).
- The LLM responds with
STATUS: approvedorSTATUS: needs_revisionplus feedback. - If
needs_revision, the harness implements the feedback and callsmcp_brainrouter_request_reviewagain. Up to 5 iterations. - After 5 failed iterations (or an LLM error), the session escalates to human review at
http://127.0.0.1:9099/review/.
| Parameter | Required | Description |
|---|---|---|
taskId |
yes | Unique ID for this task, e.g. feature-20260424-001 |
summary |
yes | 2–3 sentences: what changed, why, and any assumptions |
details |
no | Additional technical context |
conversationHistory |
no | Array of strings — recent conversation for context |
cwd |
no (strongly recommended) | Absolute path to the project directory — required for accurate git diff; falls back to peer-cred-resolved cwd if omitted |
If you are an LLM agent completing a task in a project, add this to your workflow:
After completing all work, call mcp_brainrouter_request_review with:
taskId: "<type>-<YYYYMMDD>-<seq>" (e.g. feature-20260424-001)
summary: "<2–3 sentences: what changed, why, assumptions>"
cwd: "<absolute path to the project root>"
details: "<optional extra context, changed files, security notes>"
If the response status is "needs_revision", read the feedback, fix the issues,
then call mcp_brainrouter_request_review again. Repeat until "approved".
Do not consider the task complete until you receive status: "approved".
| Tool | Parameters | Description |
|---|---|---|
request_review |
taskId, summary, cwd?, details?, conversationHistory? |
Start or continue a review |
get_session_list |
— | List all review sessions |
get_session_details |
sessionId |
Full detail for one session |
resolve_session |
sessionId, feedback |
Human resolves: "lgtm"/"ok"/"approved" → approved; any other text → needs_revision |
brainrouter includes bridge transports that connect Discord and Signal to OMP. Each bridge runs as part of the brainrouter daemon and shells out to the omp CLI to handle queries. Enable them in the bridge section of brainrouter.yaml.
| Command | Description |
|---|---|
!br ping |
Health check |
!br reset |
Clear conversation session |
!br status |
Show current model |
!br auto / local / cloud |
Set routing mode |
!br <model-name> |
Set specific llama-swap model (names containing - or .) |
!br list |
List all models (routing + llama-swap) |
!br model <name> <query> |
One-off model override for a single query |
!br ls |
List files in current working directory |
!br cd <dir> |
Change working directory |
!br .. |
Go up one directory |
!br mkdir <name> |
Create a directory |
!br review |
Show review mode |
!br review auto|local|cloud |
Set review mode |
!br help / !br ? |
Show command help |
| bare text | Send query directly (no prefix needed) |
| Command | Description |
|---|---|
!br ping |
Health check |
!br reset |
Clear conversation session |
!br status |
Show current model |
!br auto / local / cloud |
Set routing mode |
!br <model-name> |
Set specific llama-swap model (names containing - or .) |
!br model <name> |
Set model (legacy form) |
!br list |
List models |
!br review |
Show current review mode |
!br review auto|local|cloud |
Set review mode |
!br help / !br ? |
Show command help |
bare text or !br <query> |
Send query |
Specific llama-swap models can be set directly via !br <model-name> (e.g. !br gemma-4-26b-a4b). Model names are detected by containing - or .. Use !br list to see all available models.
Each channel (Discord) or conversation (Signal) maintains its own session. Sessions track conversation history, current working directory, and selected model. Use !br reset to clear a session.
| Data | Path |
|---|---|
| Discord sessions | ~/.local/share/omp-bridge/discord-sessions.json |
| Discord channel models | ~/.local/share/omp-bridge/discord-channel-models.json |
| Discord work dirs | ~/.local/share/omp-bridge/discord-work-dirs.json |
| Signal sessions | ~/.local/share/omp-bridge/signal-sessions.json |
| Signal channel models | ~/.local/share/omp-bridge/signal-channel-models.json |
| Signal work dirs | ~/.local/share/omp-bridge/signal-work-dirs.json |
Long responses are automatically chunked (1500 chars for Discord, 4000 chars for Signal).
manifest:
base_url: "http://localhost:3001/v1" # required
api_key_env: MANIFEST_API_KEY # optional — name of env var holding mnfst_* key
llama_swap:
base_url: "http://localhost:8081/v1" # required
fallback_model: "my-model" # required — must match a key in llama-swap config
local_system_prompt: "/path/to/prompt.md" # optional — override built-in lean prompt
bonsai:
model_path: "/path/to/prism-ml_Bonsai-8B-unpacked-Q4_K_M.gguf" # required — absolute path
review:
max_iterations: 5 # LLM review rounds before escalating to human
forced_mode: "auto" # "auto" | "cloud" | "local" — default "auto"
forced_model: "my-model" # only used when forced_mode = "local"
bridge:
omp_path: "omp" # path to omp CLI binary
work_dir: "/home/you" # default working directory
aliases_config: "~/.config/omp-bridge/config.yaml" # model alias definitions
timeout_secs: 600 # per-query timeout
default_model: "brainrouter/auto" # model for new sessions
discord:
enabled: false # set true + provide token to activate
token: "Bot ..." # Discord bot token — required when enabled
prefix: "!" # command prefix
signal:
enabled: false # set true + provide account to activate
account: "+15551234567" # E.164 phone — required when enabled
group_id: "base64..." # restrict to one Signal group
prefix: "!" # command prefix
storage_path: "/path/to/signal-cli/data" # signal-cli storage
llama_swap_url: "http://localhost:8081" # for llama-list command| Variable | Description |
|---|---|
RUST_LOG |
Log level filter (default info). Overrides --log-level. Example: RUST_LOG=debug |
HOME |
User home directory. Used for default paths |
XDG_RUNTIME_DIR |
Runtime directory for UDS socket (default /run/user/$UID) |
XDG_CONFIG_HOME |
Config directory for persisted review state (default ~/.config) |
BRAINROUTER_MANIFEST_DIR |
Override Manifest docker-compose directory for restart/upgrade |
<manifest.api_key_env> |
Dynamic: whatever env var name is set in manifest.api_key_env (e.g. MANIFEST_API_KEY) holds the Manifest API key |
| Command | Description |
|---|---|
brainrouter serve |
HTTP proxy daemon. Listens on TCP :9099 and UDS /run/user/$UID/brainrouter.sock |
brainrouter mcp |
MCP stdio server. Spawned by harnesses; forwards tool calls to the daemon over UDS |
brainrouter install <harness> |
Idempotently patches harness config. Harnesses: omp, vibe, opencode, codex, droid, claude, pi |
All on http://127.0.0.1:9099.
| Method | Path | Protocol | Notes |
|---|---|---|---|
GET |
/health |
— | {"status":"ok"} |
GET |
/v1/models |
OpenAI | Returns auto, local, cloud |
POST |
/v1/chat/completions |
OpenAI | Main routing endpoint |
POST |
/v1/messages |
Anthropic | For Claude Code and droid |
| Method | Path | Notes |
|---|---|---|
GET |
/api/versions |
Installed versions + latest available |
GET |
/api/routing-events |
Live routing events feed |
GET |
/api/routing-stats |
Routing statistics |
GET |
/api/service-health |
Service health status per provider |
GET |
/api/bridge-status |
Bridge transport status (Discord / Signal) |
GET |
/api/inference-status |
Current inference state (for progress bar) |
GET |
/api/review-config |
Current review mode and forced model |
POST |
/api/review-config |
Update review mode / forced model |
GET |
/api/models/llama-swap |
Model list from llama-swap |
POST |
/api/upgrade/llama-swap |
Build and install latest llama-swap binary |
POST |
/api/upgrade/manifest |
Pull latest Manifest image and recreate container |
POST |
/api/upgrade/toolbox |
Pull latest toolbox image and recreate container |
POST |
/api/restart/:service |
Restart llama-swap, manifest, llama-cpp, or brainrouter |
| Method | Path | Notes |
|---|---|---|
GET |
/review/ |
Session dashboard |
GET |
/review/session/:id |
Session detail |
GET |
/review/api/sessions |
JSON session list |
GET |
/review/api/sessions/:id |
JSON session detail |
POST |
/review/api/request |
Start a review. Body: {taskId, summary, details?} |
POST |
/review/api/resolve |
Resolve a review session. Body: {sessionId, feedback} |
POST |
/review/api/continue |
Continue a review iteration |
POST |
/review/api/lgtm |
Quick-approve a review session |
POST |
/review/session/:id/resolve |
Human resolve. Body: {feedback: "lgtm"} |
src/
main.rs -- clap dispatcher (serve | mcp | install)
daemon.rs -- startup: loads Bonsai, wires state, starts server
server.rs -- hyper HTTP router
classifier.rs -- Bonsai 8B classifier (Cloud/Local decision)
router.rs -- routes to Manifest or llama-swap; circuit breaker; fallback
prompt_rewriter.rs -- system prompt rewriter for local mode
anthropic.rs -- Anthropic <> OpenAI protocol translation
mcp_server.rs -- JSON-RPC stdio, forwards to daemon over UDS
install.rs -- idempotent harness config merger
session.rs -- in-memory review session store
config.rs -- YAML config parsing and validation
types.rs -- OpenAI-compatible request/response types
lib.rs -- library root
peer_cwd.rs -- peer CWD resolution via /proc
routing_events.rs -- routing event store (last 500 events)
inference_state.rs -- inference state tracking
review/
mod.rs -- ReviewService
review_loop.rs -- iterative LLM review loop
context.rs -- gathers PRD, git diff, AGENTS.md
prompt.rs -- review prompt template
escalation/
mod.rs -- /review/* HTTP handlers + ReviewRequest parsing
templates/ -- embedded HTML: dashboard + session detail
provider/
mod.rs -- Provider trait + SseStream type
openai.rs -- OpenAI-compatible HTTP adapter
health.rs -- circuit breaker (3 failures -> open; 60 s cooldown)
stream.rs -- TimeoutStream: chunk stall detection
bridge/
mod.rs -- bridge feature flags and init
core.rs -- shared bridge logic (session, OMP dispatch, chunking)
persist.rs -- JSON persistence for sessions, models, work dirs
discord/
mod.rs -- Discord bot (serenity) with command handler
signal/
mod.rs -- Signal bot with polling loop
| Service | Purpose | Default URL |
|---|---|---|
| Manifest | Cloud LLM router — provider selection, failover, cost tracking | http://localhost:3001 |
| llama-swap | Local model runner — spawns llama-server on demand | http://localhost:8081 |
| Bonsai | In-process classifier — no HTTP hop | loaded from model_path |
cargo test74 tests across the codebase: circuit breaker, Anthropic protocol translation, idempotent config merging, review session lifecycle, classifier parse logic, request translation, failover, install, review loop, and bridge persistence.