Quick Start · What It Does · HTTP API · Benchmarks · OpenRouter Fusion Alternative · Contributing · License
Fusion Engine is a self-hosted Fusion API and OpenRouter Fusion alternative for building reliable AI model ensembles. Send one prompt to N large language models in parallel through OpenRouter, collect every response with token, latency, and cost metadata, then have a configurable judge model synthesize one higher-quality answer.
Use Fusion Engine as a Python library, CLI, FastAPI service, or
OpenAI-compatible /v1/chat/completions API for AI agents that need
multi-model LLM orchestration, fused tool calling, transparent model comparison,
and eval-driven panel tuning.
If you are looking for a Fusion API, OpenRouter Fusion, Fusion Fable, or Fable Fusion style system, this project provides the self-hosted Python implementation: model panels, judge prompts, OpenRouter routing, OpenAI-style chat completions, and benchmarkable fusion results.
One prompt in → many models answer → one fused answer out. You control the panel composition, the judge prompts, and where it runs — no vendor lock-in.
| Widget | What Fusion Engine Gives You |
|---|---|
| Fusion API | Self-hosted multi-model fusion with transparent cost, token, and latency data. |
| OpenRouter Fusion | Bring your own OpenRouter models and panel definitions instead of using a black box. |
| Fable Fusion / Fusion Fable | Keyword-aligned multi-model synthesis for agents, research, code, and creative workflows. |
| OpenAI-compatible gateway | /v1/models and /v1/chat/completions endpoints for existing AI clients. |
| Eval harness | GSM8K, HumanEval, MMLU, GPQA, paired baselines, and cost-aware reporting. |
Fusion Engine is an open-source Python framework for parallel LLM inference, LLM ensemble routing, and judge-model synthesis. It lets you build a custom panel of OpenRouter models, run them concurrently, and combine their answers with a judge prompt you own.
Key capabilities:
- Multi-model LLM orchestration — fan out one prompt to several providers and models through OpenRouter.
- AI model ensemble synthesis — use a judge model to resolve disagreements and produce one final response.
- OpenAI-compatible API — expose a panel as
fusion/<panel>for clients that call/v1/modelsand/v1/chat/completions. - Tool-calling support — let panelists propose tool calls and have the judge select one synthesized next action.
- Benchmark harness — compare fusion against single-model and judge-alone baselines on GSM8K, HumanEval, MMLU, GPQA, or custom JSONL evals.
- Self-hosted control — keep panel definitions, judge prompts, logs, costs, and deployment under your control.
For search engines, AI agents, and LLM retrieval systems, Fusion Engine is:
- Project type: self-hosted OpenRouter Fusion API, LLM ensemble framework, and OpenAI-compatible LLM gateway.
- Related search terms: Fusion API, OpenRouter Fusion, Openrouter Fusion, Fusion Fable, Fable Fusion, multi-model fusion, model fusion API, LLM fusion engine, and AI model ensemble.
- Primary job: send one prompt to multiple LLMs, collect independent answers, and synthesize one response with a judge model.
- Interfaces: Python API, command-line interface, FastAPI HTTP server, and OpenAI-compatible chat completions endpoint.
- Best use cases: AI agent backend, model comparison, code review, research synthesis, brainstorming, security review, and eval-driven model routing.
- Evaluation support: HumanEval code execution, GSM8K numeric grading, MMLU multiple choice, GPQA science questions, and custom benchmark datasets.
- Not a: model provider, vector database, RAG framework, or hosted SaaS.
Fusion Engine is written to be discoverable for developers comparing or building systems around these terms:
- Fusion API
- OpenRouter Fusion / Openrouter Fusion
- Fusion Fable / Fable Fusion
- self-hosted Fusion API
- OpenRouter Fusion alternative
- multi-model fusion API
- LLM fusion engine
- AI model ensemble
- judge model synthesis
- OpenAI-compatible model router
A single model has blind spots. Asking several models the same question and fusing their answers gives you:
- Higher reliability — agreement across independent models is a strong signal; disagreement is a flag worth surfacing.
- Coverage — different models are strong at different things (reasoning, code, recency, tone). The judge keeps the best of each.
- Control — you own the panel, the judge prompt, and the data path. Run it locally, pin your own models, swap judges per task.
┌──────────────────────────────────────┐
│ FusionEngine │
│ │
"Analyze the ┌──┤ 1. load panel (budget/quality/code) │
competitive ─────┘ │ 2. fan out the prompt in parallel │
landscape..." │ │
│ ┌─────────────────────┐ │
│ ┌───▶│ model A (OpenRouter)│───┐ │
│ │ └─────────────────────┘ │ │
prompt ─────────────┼───┼───▶│ model B (OpenRouter)│───┼─┐ │
│ │ └─────────────────────┘ │ │ │
│ └───▶│ model C (OpenRouter)│───┘ │ │
│ └─────────────────────┘ │ │
│ ▼ │
│ 3. collect responses ┌────────────┐
│ (text, latency, │ COLLECT │
│ tokens, cost) └─────┬──────┘
│ │ │
│ 4. judge synthesizes ┌─────▼──────┐
│ all responses ──────▶│ JUDGE │
│ (judges/<panel>.md) │ model │
│ └─────┬──────┘
└─────────────────────────────────┼──────┘
▼
┌───────────────────────────┐
│ FusionResult │
│ • synthesized answer │
│ • per-model responses │
│ • cost / latency / usage │
└───────────────────────────┘
Flow: prompt → parallel dispatch → N models → collect → judge → synthesized answer
# 1. Get the code
git clone https://github.com/<owner>/fusion-engine.git
cd fusion-engine
# 2. Install the CLI and core dependencies
python3 -m pip install -e .
# 3. Configure your OpenRouter key (required)
export OPENROUTER_API_KEY=sk-or-v1-...
# ...or: cp .env.example .env and edit it
# 4. Run a fusion query
fusion run "What are the implications of quantum computing on cryptography?" -p budgetUseful CLI flags:
# Show each model's answer + latency + cost, not just the synthesis
fusion run "Compare REST vs gRPC for microservices" -p quality -v
# Code-focused panel with live web search enabled
fusion run "Review this auth flow for vulnerabilities" -p code --web-search
# List configured panels and their member models
fusion panelsA panel is a named set of member models plus a judge. Panels live in
panels/*.json — those files are the source of truth. Model IDs use
OpenRouter's provider/model form.
| Panel | Member models | Judge (model · template) | Best for | Est. cost / query* |
|---|---|---|---|---|
budget |
xiaomi/mimo-v2.5, deepseek/deepseek-v4-flash, xiaomi/mimo-v2.5-pro |
qwen/qwen3.7-plus · default |
Drafts, summaries, brainstorming, high-volume runs | $0.02–0.05 |
quality |
anthropic/claude-fable-5, openai/gpt-5.5 |
anthropic/claude-opus-4 · deep_research |
High-stakes analysis, research, hard reasoning | $0.50–1.00 |
code |
openai/codex, anthropic/claude-opus-4, deepseek/deepseek-v4-pro |
anthropic/claude-opus-4 · code_review |
Code review, debugging, security analysis, codegen | $0.30–0.60 |
self_fuse |
deepseek/deepseek-v4-pro ×2 (independent samples) |
deepseek/deepseek-v4-pro · default |
Measuring how much fusion alone helps, model held constant | ~$0.01 |
* From each panel file's estimated_cost_per_query. Actual cost depends on
prompt/response length and OpenRouter's live per-model pricing (OpenRouter is the
source of truth for rates). Run with -v to see the exact per-query cost.
Each panel is a JSON file in panels/. The schema:
{
"schema_version": 1,
"name": "quality",
"description": "Frontier models for maximum answer quality.",
"models": [
{ "slug": "anthropic/claude-fable-5", "role": "panelist", "max_tokens": 8192 },
{ "slug": "openai/gpt-5.5", "role": "panelist", "max_tokens": 8192 }
],
"judge_model": "anthropic/claude-opus-4",
"judge_template": "deep_research",
"estimated_cost_per_query": { "min": 0.50, "max": 1.00, "currency": "USD", "unit": "query" }
}judge_template names a file in judges/ (without the .md). max_tokens, if
set on a model entry, is forwarded to OpenRouter for that panel member. Drop in a
new <name>.json and it becomes selectable with -p <name>.
After the panel responds, the judge model is given a synthesis prompt plus
all the collected answers. Judge templates live in judges/*.md and are
selected per panel via the judge_template field. The repo ships
default, deep_research, code_review, creative, and tool_synthesis.
They generally
instruct the judge to:
- Read every panel response without assuming any one is correct.
- Identify points of agreement (treat as high-confidence) and disagreement (surface, don't silently drop).
- Resolve conflicts on the merits, keeping the strongest reasoning from each.
- Produce one answer — not a list of "Model A said… Model B said…".
- Flag remaining uncertainty rather than papering over it.
Specialized templates tune this per use case — e.g. code_review (used by the
code panel) prioritizes correctness and security; deep_research (used by
quality) weights depth and rigor; creative favors originality. Because the
templates are plain Markdown you check into the repo, you can edit synthesis
behavior without touching code.
FusionEngine.fuse() is async and takes an explicit list of model slugs (or
panel model dictionaries with slug and optional max_tokens) plus a judge
model. The CLI resolves a panel name like quality by reading
panels/*.json.
import asyncio
import json
from pathlib import Path
from fusion import FusionEngine # run from the project dir; see note below
def load_panel(name: str):
cfg = json.loads(Path(f"panels/{name}.json").read_text())
return cfg["models"], cfg["judge_model"]
async def main():
panel, judge_model = load_panel("quality")
# Reads OPENROUTER_API_KEY from the environment by default.
engine = FusionEngine()
result = await engine.fuse(
"What are the implications of quantum computing on cryptography?",
panel=panel,
judge_model=judge_model,
web_search=False,
)
# The fused, synthesized answer (from the judge):
print(result.answer)
# Per-model detail (list[PanelResponse]):
for r in result.panel_responses:
status = r.error or f"{r.latency_ms:7.0f}ms ${r.cost_usd:.4f}"
print(f"{r.model:40s} {status}")
if r.ok:
print(r.content)
# Run-level metadata:
print("judge:", result.judge_response.model)
print("total cost: $%.4f" % result.total_cost)
asyncio.run(main())PanelResponse exposes model, content, tokens_in, tokens_out,
latency_ms, cost_usd, error, and the ok property. FusionResult exposes
answer, panel_responses, judge_response, total_cost, total_latency_ms,
and the successful_panel property.
Imports. The public module import is currently
from fusion import FusionEngine, FusionResult, PanelResponse. Installing withpython3 -m pip install -e .also gives you thefusionandfusion-engineconsole scripts.
Prefer to call Fusion over HTTP — from another service or an agent — instead of
shelling out to the CLI? server.py is a thin
FastAPI wrapper over the same engine, sharing
panel resolution with the CLI via panels.py.
python3 -m pip install -e ".[server]"
export OPENROUTER_API_KEY=sk-or-v1-...
uvicorn server:app --host 127.0.0.1 --port 8000 # or: python3 server.pyIf you expose the API beyond localhost, set FUSION_SERVER_API_KEY and send
Authorization: Bearer <value> on endpoints that spend credits (/fuse and
/v1/chat/completions).
| Method & path | Purpose |
|---|---|
GET /health |
Liveness, plus whether an OpenRouter key is configured. |
GET /panels |
List configured panels with members, judge, and est. cost. |
GET /panels/{name} |
Full JSON config for one panel. |
POST /fuse |
Run a fusion; return the synthesized answer + per-model detail. |
GET /v1/models |
OpenAI-compatible model list, one model per panel (fusion/<panel>). |
POST /v1/chat/completions |
OpenAI-compatible chat completion with fused tool-call support. |
POST /fuse body — only prompt plus one of panel/models is required:
{
"prompt": "Compare REST vs gRPC for microservices",
"panel": "quality",
"judge_model": null,
"judge_template": null,
"web_search": false
}Pass models (a list of OpenRouter slugs) instead of panel to fuse an ad-hoc
set, and judge_model / judge_template to override a panel's defaults.
curl -s localhost:8000/fuse -H 'content-type: application/json' \
-d '{"prompt":"Explain CRDTs","panel":"budget"}' | jq .answerThe response is the full FusionResult as JSON — answer, panel_responses[]
(each with content, tokens, latency, cost, error), judge_response,
total_cost, and total_latency_ms. Interactive docs live at /docs.
OpenRouter offers a hosted multi-model "fusion" feature. Fusion Engine is a self-hosted OpenRouter Fusion alternative with more control:
| Fusion Engine (this project) | Hosted Fusion | |
|---|---|---|
| Judge prompts | Yours — plain Markdown in judges/, editable per panel |
Provider-defined |
| Panel composition | Yours — any OpenRouter models, defined in panels/*.json |
Limited / provider-curated |
| Where it runs | Locally (or any host you control) | Provider-side |
| Transparency | Full per-model responses, latency, tokens, cost | Aggregated |
| Vendor lock-in | None — it's your code; swap providers freely | Tied to the provider's feature |
| Cost | Pay only OpenRouter token costs | Same, plus whatever the feature adds |
You still use OpenRouter for the actual model calls (one key, many providers) — but the orchestration, judging, and policy are yours.
The whole premise is that a panel beats any single model. Don't take it on
faith — measure it. The evals/ harness scores a panel's fusion against the
right baselines on the same items.
export OPENROUTER_API_KEY=sk-or-v1-...
python3 evals/run_eval.py --panel quality --dataset evals/datasets/sample.jsonl
# iterate cheaply with --limit 5Don't hand-write items — pull real benchmarks with evals/prepare.py, then point
the runner at the generated dataset:
python3 -m pip install -e ".[eval]" # only needed for mmlu / gpqa
python3 evals/prepare.py gsm8k --limit 100 # -> evals/datasets/gsm8k.jsonl
python3 evals/run_eval.py --panel quality --dataset evals/datasets/gsm8k.jsonl| Benchmark | Tests | Grader | Source |
|---|---|---|---|
gsm8k |
grade-school math reasoning | numeric |
GitHub, ungated |
humaneval |
Python synthesis, run against unit tests | code_exec |
GitHub, ungated |
mmlu |
57-subject knowledge (multiple choice) | multiple_choice |
HF cais/mmlu |
gpqa |
graduate-level science (multiple choice) | multiple_choice |
HF Idavidrein/gpqa — gated (accept terms + huggingface-cli login) |
gsm8k/humaneval download directly (httpx); mmlu/gpqa use the datasets
library. --limit N takes a seeded random sample to bound cost. code_exec runs
model-generated code — sandbox it (container/VM) for untrusted models.
A dataset is JSONL, one item per line (the format prepare.py emits, and what you
write for a custom set):
{"id": "mc1", "prompt": "...", "target": "B", "grader": "multiple_choice", "category": "science"}For each item the harness runs three kinds of system and grades each answer:
fusion:<panel>— the whole panel + judgesingle:<model>— each panel member on its ownjudge_alone:<model>— the judge model alone, with no panel
That last one is the baseline most "ensembles win" claims forget: fusion adds the
panel on top of the judge, so it has to beat the judge answering solo — and the
best single member — to justify its extra cost. The report prints per-system
accuracy/cost/latency plus a paired comparison (Δaccuracy, win/tie/loss, and a
bootstrap 95% CI) so you can tell a real gain from noise, and weigh it against the
N× cost. Graders ship for multiple_choice, numeric, exact_match, and
contains (in evals/graders.py); add your own there.
Beyond a one-off check, this is how you tune panels — swap models, judges, or templates and keep what moves the metric for your workload.
| Resource | Link |
|---|---|
| Contributing guide | CONTRIBUTING.md |
| Security policy | SECURITY.md |
| CI workflow | .github/workflows/ci.yml |
| License | LICENSE |
Issues and pull requests should include enough context to reproduce the behavior,
especially for model, panel, judge-template, and benchmark changes. Security
reports should follow the private disclosure path in SECURITY.md.
- Web UI — a browser front-end (on top of the HTTP API) for running fusions and diffing model answers.
- Streaming — stream panel responses and the synthesis as they arrive.
- Result caching — cache by
(prompt, panel)to avoid paying twice for identical runs. - More evals — add an LLM-as-judge grader (on a neutral model) for
open-ended tasks, more benchmarks (MATH, SWE-bench), and per-call result
caching so re-runs are free. (Benchmarks + harness already live in
evals/: GSM8K, HumanEval, MMLU, GPQA with numeric/code-exec/multiple-choice graders.) - Package namespace — add a stable
fusion_engineimport package while preserving the currentfusionmodule import.
MIT. See LICENSE.