TFactory

Autonomous test generation and execution platform. Started as a sister project to AIFactory — now a standalone product you can drive from any tool.

Hand TFactory a finished feature's acceptance criteria — from AIFactory, Claude Code, or anything else, via the MCP control plane or a plain file (markdown / Gherkin / EARS, see guides/spec-sources.md). It generates tests aligned to those criteria across the lane spine (unit / browser / api / integration / mutation), runs them in an isolated sandbox, evaluates quality with a five-signal verdict, grades each acceptance criterion against a test that actually ran, commits the suite to the feature branch, and posts a triage report to the PR — autonomously.

Where we are (June 2026, v0.9.x). The planner auto-runs on ingest, and the AIFactory handoff carries the signed Task Contract plus the deployed URL — so TFactory tests the declared acceptance criteria against the real deployment. The browser lane now runs in a reproducible per-task Nix toolchain inside an ephemeral Kubernetes Job (RFC-0005 Tier A), captures screenshots and video recordings, and surfaces them as visible evidence in the portal and in the CFactory cockpit. An acceptance-criteria fidelity ledger reports an honest "verified X/Y" per criterion, and authenticated targets — including ones gated by TOTP two-factor auth — can be tested against a disposable identity provider with zero production credentials.

Quickstart (NixOS / flake-based)

# One-command dev environment via the flake:
nix develop

# (inside the shell)
tfactory-minimal-venv   # creates apps/backend/.venv with just pytest+pytest-asyncio
tfactory-test           # runs the non-SDK backend suite

# For the full backend SDK install (graphiti, claude-agent-sdk, etc.):
bootstrap-venv

The dev shell brings in Python 3.13, Node 22, uv, git, gh, just, ripgrep, jq and docker-client, plus four shell functions: bootstrap-venv, tfactory-minimal-venv, tfactory-test, verify-fork.

For auto-loading via direnv:

nix profile install nixpkgs#nix-direnv
direnv allow

Non-Nix users can fall back to npm run install:backend (per the Quickstart on the docs site) — the Nix path just makes setup deterministic.

Note for non-Nix npm users: the nix devShell sets NODE_ENV=production, which makes npm install skip devDependencies (including vitest). If you're inside nix develop and running npm install in apps/frontend-web/, first unset NODE_ENV. Captured in detail in guides/e2e-smoke.md.

Running the portal

# Backend (FastAPI on :3102)
cd apps/web-server
source .venv/bin/activate    # if you have a per-app venv
python -m server.main

# Frontend (Vite dev server on :3100)
cd apps/frontend-web
npm install                  # unset NODE_ENV first if inside nix develop
npm run dev

Then visit http://localhost:3100 for the TFactory portal.

The portal's task-detail view (apps/frontend-web/src/components/tfactory/) has tabs for Status / Lanes / Verdicts / Report / Acceptance / Logs / Evidence:

Acceptance — the acceptance-criteria fidelity ledger ("Verified X/Y"), with each criterion linked to the test and screenshots that prove it.
Evidence — the browser lane's captured screenshots and video recordings, plus per-test evidence and visual-regression baselines.
Lanes — the Unit / Browser / API / Integration / Mutation lane spine.
Logs — a WebSocket live tail of the run.

High-level architecture

flowchart TD
    H["AIFactory branch / Claude Code / any tool<br/>handover: signed contract + deployed URL"] --> P["Planner<br/>(test_plan.json)"]

    P --> U["Gen-Unit"]
    P --> B["Gen-Browser"]
    P --> A["Gen-API"]
    P --> I["Gen-Integration"]
    P --> M["Gen-Mutation"]

    U --> X["Executor<br/>(sandboxed per task)"]
    B --> X
    A --> X
    I --> X
    M --> X

    X --> E["Evaluator<br/>(five-signal verdict)"]
    E --> T["Triager"]
    T --> R["git commit + PR comment"]

The middle row is the five-lane spine — one generator per modality (unit, browser, api, integration, mutation). Five pipeline stages (Planner / per-lane Generators / Executor / Evaluator / Triager) and five lanes (unit / browser / api / integration / mutation), with a spec-aware handover from AIFactory. The stages auto-advance via TFACTORY_AUTO_* env vars; each stage writes its outputs to ~/.tfactory/workspaces/{project}/specs/{spec}/ and forwards via a fire-and-forget scheduler. See apps/backend/agents/ for each agent.

Reproducible execution (RFC-0005 Tier A)

The cluster pods have no container runtime, so the browser lane runs in an ephemeral Kubernetes Job using a per-task Nix toolchain: the planner declares the environment, a flake materializes the exact tools (including a version-matched Playwright and its browsers), and the test runs against the real app inside the Job. Screenshots land in findings/screenshots/ and Playwright recordings in findings/videos/; both are served by the portal and rendered in the Acceptance and Evidence tabs.

Status by lane

The lane spine is modality-based (Decision 2). Security scanning is delegated to dedicated pipelines and is out of scope here; TFactory focuses on functional and feature testing.

Lane	Status	Runtime	Coverage	Evidence
Unit	Active	`tfactory-runner-pytest` (Python) / `tfactory-runner-jest` (TypeScript)	line (cobertura / lcov)	—
Browser	Active	Nix toolchain in a k8s Job (Playwright); host fallback where applicable	n/a (line coverage doesn't apply when the test drives the browser)	screenshots, video, trace
API	Active	per-framework image + HTTP HAR recorder	line where applicable	network.har
Integration	Active	per-framework image + AppRuntime (multi-service)	line where applicable	network.har, service logs
Mutation	Active	`mutmut` (Python) / Stryker (TypeScript) — one-mutation-per-run probe in the Evaluator	per-mutant (killed / survived)	—

The Planner picks each subtask's lane from its (language, framework) via the framework registry (frameworks/{pytest,jest,playwright}/descriptor.yaml). New languages and additional pipelines slot into this same spine through new FrameworkDescriptors — no lane additions required.

Acceptance-criteria fidelity

A passing test is not the same as a verified requirement. The Triager builds an acceptance-criteria ledger that maps each criterion to the tests that exercise it and grades it verified only when at least one of those tests actually passed — reporting an honest "verified X/Y", never a blanket "done". For interactive UI criteria, the linked evidence is the screenshot of the rendered page and a recording of the test driving it.

Authenticated and MFA-gated targets

Agents often need to reach real services behind a login. TFactory's .tfactory.yml auth schema supports form, API-token, basic-auth and TOTP two-factor credentials, with an ordered login-step flow for SSO. For 2FA we do not bypass MFA: following RFC-0007's Class C pattern the pipeline can provision a disposable identity provider (an ephemeral Keycloak), seed a user whose OTP secret it owns, generate valid RFC-6238 codes at run time with a fill_totp login step, capture the authenticated page, and tear the IdP down — with zero production credentials. See guides/credentials.md and the Credentials page.

End-to-end smoke

Once you have a real AIFactory project, a Claude API key and Docker:

scripts/e2e-smoke.sh --list           # list the verification scenarios
scripts/e2e-smoke.sh --dry-run --all  # sanity-check the runner (no env, no LLM)

export ANTHROPIC_API_KEY=sk-ant-...
export TFACTORY_AIFACTORY_ROOT=$HOME/Source/GitHub/MyApp
export TFACTORY_AIFACTORY_BRANCH=feature/...
scripts/e2e-smoke.sh --all

Full walkthrough, including the manual scenarios (mutation, hallucination guard, docker-down), in guides/e2e-smoke.md.

Tests

Suite	What	Time
Backend non-SDK (`tests/test_*.py`)	Pure-Python primitives + agent loops with a mocked SDK	seconds
Frontend (`apps/frontend-web/src/*/.test.tsx`)	vitest + React Testing Library	seconds
End-to-end smoke (`scripts/e2e-smoke.sh`)	Real LLM + Docker + git + gh — operator-driven	manual

CI runs the first two on every commit; the third is operator-driven.

# Backend
PYTHONPATH=apps/backend apps/backend/.venv/bin/pytest -q tests/

# Frontend (under nix devShell, unset NODE_ENV first)
cd apps/frontend-web && ../../node_modules/.bin/vitest run

# Fork-hygiene check (every stray AIFactory reference is allowlisted explicitly)
scripts/verify-fork.sh --no-import

Connect to your environment — Credential Broker

Agents often need to reach real services and cloud environments (a staging API, a Kubernetes cluster, a GCP/AWS/Azure project) to plan and run tests — but secrets must never land in the repo. The Credential Broker (epic #62) resolves credentials from a pluggable backend and exposes them to the agents ephemerally:

Backends: Azure Key Vault, AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, local sops / age / agenix, or plain env. One ref syntax (vault:path#field, gcp-sm://proj/secret, sops:file#key, …); cloud SDKs load lazily so an absent package never breaks startup.
Ephemeral and redacted: file credentials (kubeconfig, GCP ADC) are written 0600 to a per-task scratch dir and wiped when the task ends; resolved values are redacted from logs.
Honest egress: off by default — no cloud credential is resolved unless the project opts in (.tfactory.yml egress.enabled). python -m tfactory_secrets.cli audit prints a secret-free manifest of exactly what would leave your network.

Run on any LLM

TFactory routes each pipeline phase to a provider purely from the model string — no separate provider switch. Supported: the Claude Agent SDK (primary), OpenAI Codex, Gemini CLI, GitHub Copilot CLI, Ollama (local), and any OpenAI-compatible endpoint (vLLM / LM Studio / OpenRouter / Together / Groq / LocalAI). This lets a team run on a flat-rate subscription, a self-hosted model, or fully air-gapped — with an honest data-egress badge (python apps/backend/byo_llm.py <model>) so you always know whether a run keeps data on your network. See guides/byo-llm.md.

Docs

Full project documentation is published at https://tfactory.freundcloud.com/.

Direct links:

Architecture — directory structure, workspace layout, dataflow
Showcase — the pipeline in action with real evidence
Design Plan — rationale, locked decisions, risk register
Technical Spec — per-component detail
Credentials — the Credential Broker and MFA
Progress — the per-task build log

In-repo guides (guides/):

guides/e2e-smoke.md — operator guide for the verification scenarios
guides/HANDOVER_WORKFLOW.md — how to trigger TFactory from a live Claude Code session
guides/CLAUDE_CODE_MCP_TOOLS.md — driving TFactory tasks from the MCP control plane
guides/byo-llm.md — run TFactory fully on your own infrastructure with a verifiable no-egress guarantee
guides/spec-sources.md — use TFactory without AIFactory: ingest any acceptance-criteria source (markdown / Gherkin / EARS)

Project tracking

Epic and sub-issues: https://github.com/olafkfreund/TFactory/issues
Discussions / questions: open an issue with the question label

License

MIT OR GPL-3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 649 Commits
.agent-os		.agent-os
.claude		.claude
.github		.github
.husky		.husky
apps		apps
charts/tfactory		charts/tfactory
companion-skills		companion-skills
docker		docker
docs		docs
frameworks		frameworks
guides		guides
run.py		run.py
scripts		scripts
shared_docs		shared_docs
skills		skills
techdocs		techdocs
tests		tests
.coderabbit.yaml		.coderabbit.yaml
.dockerignore		.dockerignore
.env.example		.env.example
.envrc		.envrc
.gitignore		.gitignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.secretsignore		.secretsignore
.secretsignore.example		.secretsignore.example
.tfactory.yml.example		.tfactory.yml.example
.trivyignore		.trivyignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
ContainerAPP.md		ContainerAPP.md
Dockerfile		Dockerfile
GEMINI.md		GEMINI.md
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
catalog-info.yaml		catalog-info.yaml
devenv.lock		devenv.lock
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix
mkdocs.yml		mkdocs.yml
package-lock.json		package-lock.json
package.json		package.json
renovate.json		renovate.json
ruff.toml		ruff.toml
shell.nix		shell.nix
tsconfig.scripts.json		tsconfig.scripts.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TFactory

Quickstart (NixOS / flake-based)

Running the portal

High-level architecture

Reproducible execution (RFC-0005 Tier A)

Status by lane

Acceptance-criteria fidelity

Authenticated and MFA-gated targets

End-to-end smoke

Tests

Connect to your environment — Credential Broker

Run on any LLM

Docs

Project tracking

License

About

Uh oh!

Releases 15

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TFactory

Quickstart (NixOS / flake-based)

Running the portal

High-level architecture

Reproducible execution (RFC-0005 Tier A)

Status by lane

Acceptance-criteria fidelity

Authenticated and MFA-gated targets

End-to-end smoke

Tests

Connect to your environment — Credential Broker

Run on any LLM

Docs

Project tracking

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages