Spec-Driven Development for AI agents — plan, code, ship.
Docs · Demo · Architecture · Roadmap · Contributing
New here? Start at the docs → aifactory.freundcloud.com — guided demo, screenshots, architecture, and the full getting-started guide.
AIFactory turns GitHub issues into shipping code via a coordinated planner / coder / QA agent pipeline. You bring an issue (or a one-line task description) — AIFactory writes a spec, plans the work, codes it in an isolated git worktree, validates against the spec's acceptance criteria, and hands you back a merge-ready branch.
It also runs standalone or as the build node of the Factory line: PFactory plans and governs the work and emits a signed Task Contract; AIFactory builds it; TFactory verifies the result against the declared acceptance criteria and the deployed URL; CFactory watches all four. The contract that was approved is the contract AIFactory builds.
You watch the whole thing happen live in the Agent Console — read-only by default, one-click Attach when you want to drive — or open the task in Mission Control, a full-page three-pane workspace (plan · live activity + console · preview / files / review). See docs/docs/concepts/mission-control-workspace.md.
Where we are (June 2026, v3.6.x) — recent work on top of the enterprise and MCP shipsets:
- Parallel build executor + per-worker observability — independent subtasks build concurrently, each worker's tokens, cost, and progress tracked and shown live; honest completion accounting (a failed plan-load no longer reads as a successful build).
- Factory-line integration — AIFactory ingests a signed Task Contract (with the planned acceptance criteria) from PFactory and hands the built branch plus its deployed URL to TFactory, which verifies it with visible screenshot and recording evidence. One correlation key threads the whole run.
- Delegation (#92) — hand the coder phase off to GitHub Copilot Coding Agent or GitLab Duo Workflow while AIFactory keeps the planning and governance. Hybrid only: the planner runs on Claude, the structured plan lands as a comment on the issue, then the provider's agent codes. See
docs/docs/concepts/delegation.md.- Portal-managed Git clones (#82) — point the portal at a Git URL, it clones into a workspace root (laptop default
~/.aifactory/workspaces/, Helm-templated PVC on K8s). Stored Personal Access Tokens encrypted at rest. Required for SaaS / Kubernetes deployments. Seedocs/docs/concepts/portal-clones.md.- Scoped MCP API keys (#154) — replace the host-wide admin token at
~/.aifactory/.tokenwith per-developer scope-gatedacw_keys. Mint via Settings → API Keys, drop in$AIFACTORY_MCP_KEY, done. Legacy admin token still works as a wildcard fallback. Seedocs/docs/concepts/mcp-stdio-keys.md.Also current: a large catalog of MCP tools across stdio + remote HTTP+SSE transports, the
/handoverskill for Claude Code, default MCP servers that auto-enable per project, and Remote Control forclaude.ai/codeon any device.See
guides/HANDOVER_WORKFLOW.mdfor the developer flow,guides/CLAUDE_CODE_MCP_TOOLS.mdfor the stdio tool catalog, andguides/REMOTE_MCP_SERVER.mdfor the HTTP+SSE server (Cursor / Continue.dev / non-Claude clients).
AIFactory ships 7 major enterprise features — multi-tenant isolation, observability, audit hardening, and IdP integration for regulated deployments. All features are opt-in via Helm values; default deployments remain unchanged.
| Capability | Issue | Concept doc |
|---|---|---|
| SAML 2.0 + SCIM 2.0 — Legacy IdP federation (ADFS-era banks, Azure AD provisioning) | #41 | saml-scim |
| Tenant Isolation Mode — Per-tenant K8s namespace + NetPol + S3 + Vault + leader election | #36 | tenant-isolation |
| LiteLLM Gateway — Per-org budget + rate-limit + allowlist + PII-redacted audit log | #38 | litellm-gateway |
| Bedrock + Vertex Routing — Cloud-provider LLMs (AWS, Google) via LiteLLM | #39 | cloud-llm-routing |
| Signed Audit-Chain Anchor — Daily HMAC-anchored chain for ISO 27001 A.12 compliance | #43 | audit-anchor |
OpenTelemetry Distributed Tracing — W3C traceparent across web + agent + subprocess |
#42 | observability-tracing |
| gVisor Sandbox — Agent pods opt-in to gVisor RuntimeClass for kernel-level isolation | #37 | — |
Multi-replica support: S3 workspace storage + Redis pub/sub (#40) enable horizontally scaled deployments.
What's next: v1.2 roadmap includes Claude-on-LiteLLM enforcement wrapper, per-tenant audit anchors, and SAML Single Logout — tracked in Epic #204.
- Spec-first, not vibe-first. Every agent run starts from a written, reviewable spec with acceptance criteria. Plans are editable before code is written.
- Multi-provider by design. Pick a model per phase. Plan with Claude Opus, code with a cheap local Ollama qwen3, validate with Sonnet. Anthropic / OpenAI / Ollama / Gemini / Codex / any OpenAI-compatible endpoint.
- MCP control plane. 27 tools across two transports let any MCP-aware editor inspect and direct AIFactory. The
/handoverskill turns "this is bigger than I thought" into an autonomous overnight run with one keystroke. Scope-gated per-developer keys in v1.1 mean shared hosts and SaaS deployments don't hand out admin power. - Provider-agent delegation. On GitHub repos, AIFactory can hand the coder phase off to Copilot Coding Agent; on GitLab, to Duo Workflow. AIFactory still authors the spec + plan; the provider's agent does the typing. Cuts Claude spend ~10× for delegated tasks.
- Portal-managed clones. Point AIFactory at a Git URL and it clones into a workspace PVC (on K8s) or a configurable workspace root (on laptops). Stored PATs are encrypted at rest. Required for SaaS / Kubernetes installs.
- Infra-aware out of the box. A catalog of default MCP servers (Kubernetes, AWS, Azure, GitHub) auto-enables per project when markers + credentials line up. Read-only by default, audit-logged, CVE-aware version pins.
- One screen to drive a task. Mission Control puts the plan, the agent's live activity + embedded terminal, and the output (running preview, diff, or merge controls) side by side — no tab-switching while a build runs.
- Isolated by default. Each task runs in its own git worktree. Nothing touches your working tree until you merge.
- Auditable. Hash-chained audit log, on-disk specs+plans+QA reports, full SOC2 evidence catalog in the enterprise build.
git clone https://github.com/olafkfreund/AIFactory
cd AIFactory
npm run install:all
claude setup-token # paste into apps/backend/.env as CLAUDE_CODE_OAUTH_TOKENStart the two servers (in separate terminals):
cd apps/web-server && python -m server.main # :3101
cd apps/frontend-web && npm run dev # :3100Open http://localhost:3100 and create your first project.
Full installation guide: Getting Started →
A 45-second terminal walkthrough of the /handover workflow: clone the demo repo, file a GitHub issue, type /handover in Claude Code, AIFactory's planner → coder → QA pipeline lands at a merge-ready branch. Every artifact shown was produced by a real agent run — the recording compresses the timeline.
The repo also ships a scripted end-to-end demo that exercises the whole pipeline against a public sample repo:
./scripts/demo.shIt seeds 3 GitHub issues, registers the demo repo with your portal, imports the issues as backlog tasks, prompts you to drive Claude Code from the terminal, then kicks off an autonomous build — all in about 90 seconds. Pass --yolo to skip the Enter-prompts between steps.
Walkthrough with screenshots + browser-side video: Demo →
![]() |
![]() |
![]() |
![]() |
![]() |
|
Screenshots are auto-captured by
scripts/capture-screenshots.ts— refresh them withnpm -w apps/frontend-web run capture-screenshots.
The full documentation lives at https://aifactory.freundcloud.com/:
- Getting Started — install + first task
- Demo — guided end-to-end walkthrough
- Concepts — spec-driven development, multi-provider routing, the rmux Live Console
- Architecture — agents, data flow, security model, Mermaid diagrams
- Wiki — FAQ, troubleshooting, glossary
- Compliance — SOC 2 evidence, GDPR, encryption-at-rest
Legacy guides (pre-2026-05-26 rewrite) are archived under docs-archive/2026-05-26/ and remain searchable in git history.
- Frontend — React 19 + Vite + xterm.js + Tailwind v4
- Web Server — FastAPI + WebSocket, Postgres + Alembic migrations
- Agent Runtime — Python 3.12, Claude Agent SDK, provider abstraction over Anthropic / OpenAI / Ollama / Gemini / Codex
- Deploy — Helm chart (
charts/aifactory/), distroless cosign-signed images, OIDC SSO, KMS-backed encryption at rest
Branching: dev is the working branch. Branch from origin/dev, sign your commits (git commit -s), open PRs against dev. main is a release branch and only receives promotion merges.
git fetch origin
git checkout -b feat/my-feature origin/dev
git push -u origin feat/my-feature
gh pr create --base devCI runs ruff + pytest + frontend typecheck + Postgres acceptance + multiple compliance gates on every PR. Full guide: Contributing →
Dual-licensed under MIT or GPL-3.0 at your option — see LICENSE, LICENSE-MIT, and LICENSE-GPL.
Built with the Claude Agent SDK, FastAPI, Docusaurus, and rmux (terminal multiplexer fork in Rust, used for the Live Console).





