AI Kanban Agent Orchestrator

An autonomous, state-driven multi-agent workflow system built on top of a kanban board.

This project turns a GitHub Projects board (or Trello) into an asynchronous control plane where AI agents act as specialized SDLC roles (Senior Engineer, Implementer, Code Reviewer, QA, Gate Checker, Estimator, Merge Resolver), automatically progressing tickets through a structured pipeline.

The board is the human-facing surface. The orchestrator is the automation brain. Agents execute work in isolated git worktrees.

Vision

Use a Kanban board as an orchestration layer for autonomous AI agents.

A ticket moves into a "Ready for" state -> a specific AI role is triggered -> the agent does the work -> transitions the ticket to the next state -> optionally waits for human approval -> continues.

The board becomes:

Human review surface
Approval gate
Planning document repository
Workflow trigger engine

The orchestrator becomes:

State machine
Agent runtime
Safety layer

Core Principles

The kanban board is the human UI.
All planning text lives on the card/issue.
Agents are stateless and run in isolated git worktrees.
State transitions are deterministic.
Human approval is first-class.
Infrastructure cost approaches zero at idle.

Architecture

Execution modes:
  --mode init                                 (scaffold ./.aiboard/ for a new project)
  --mode agent --card-id N                    (direct, single card)
  --mode polling --board-id 1                 (automatic, priority-sorted pickup)
  --mode metrics [--card-id N | --since 7d]   (operational metrics report)
  --mode validation --board-id N              (read-only workflow vs. board check)
  --mode diagnose --card-id N                 (explain why a card isn't being picked up)
  --mode scaffold-board --board-id N [--apply] (create missing fields/labels via gh)
  --install                                   (install bundled Claude Code skill(s) into ~/.claude/skills/; combinable with any --mode)

Conversational front-end (optional):
  /aiboard                                    (Claude Code skill bundled in skills/aiboard/, install once via `aiboard --install`)

AgentRunner flow (agent_run states):
  Fetch card → move to IN_PROGRESS column
  → create git worktree, resolve cross-references
  → write task files + comments file
  → [optional] create Docker container session (ISessionableAgentExecutor)
  → execute steps sequentially; each step routed through session (docker exec) or direct executor
  → optional gate check (lightweight Haiku validation) — also routed through session
  → optional specialist reviews — also routed through session
  → dispose session (docker stop + docker rm)
  → post-process: upsert step comments, handle git, move to outcome column

MergeRunner flow (system_merge states):
  Fetch card → find work branch → merge to main (--no-ff)
  → on conflict: abort + transition to MERGE_CONFLICT target
  → on success: push, cleanup branch, move to Done

CompletionRunner flow (children_complete states):
  Poll child cards → check all reached terminal state → move parent to Done

Event-driven parent completion (completeParentIfReady):
  Child task reaches Done → check parent's siblings → all done? → transition parent to Done

Board Structure (GitHub Projects)

#	Column	Role(s)	Gate Type
1	Backlog	--	Manual entry
2	Ready for Design	Senior Engineer + Estimator (4 steps + optional specialist reviews)	agent_run
3	Designing	--	In-progress
4	Design Questions	--	Holding (NEEDS_INFO)
5	Designed	--	Manual gate
6	Ready for Tasking	Senior Engineer (1 step: decompose story into tasks)	agent_run
7	Tasking	--	In-progress
8	Waiting for Tasks	--	holding (event-driven via completeParentIfReady)
9	Ready for Implementation	Implementer + Code Reviewer (2 steps + optional specialist reviews)	agent_run
10	Implementing	--	In-progress
11	Implementation Questions	--	Holding (NEEDS_INFO)
12	Ready for Test	QA + Doc Updater (2 steps + optional specialist reviews)	agent_run
13	Testing	--	In-progress
14	Tested	--	Manual gate
15	Approved	--	system_merge
16	Merging	--	In-progress
17	Done	--	Terminal
18	Error	--	Holding

Roles

Role	Model	Purpose
`senior_engineer`	claude-opus-4-6	Design pipeline (3 steps: review related tickets, create design, review conflicts)
`implementer`	claude-sonnet-4-6	Code implementation (uses senior_engineer system prompt)
`code_reviewer`	claude-sonnet-4-6	Post-implementation code review
`qa`	claude-opus-4-6	Test validation
`gate_checker`	claude-haiku-4-5-20251001	Lightweight gate checks after design, implementation, and test
`estimator`	claude-haiku-4-5-20251001	Ticket estimation (design step 4, calibration-based sizing)
`specialist_reviewer`	claude-sonnet-4-6	On-demand specialist reviews requested by gate checks
`senior_specialist_reviewer`	claude-opus-4-6	High-stakes specialist reviews (legal, compliance, privacy)
`merge_resolver`	claude-sonnet-4-6	Merge conflict resolution

How It Works

Operator creates an issue, adds it to the project board in Backlog. Issues are labeled type:story, type:task, or type:bug to indicate their card type.
Operator writes requirements/scope and moves card to Ready for Design.
Operator runs: .\scripts\run_once.ps1 -CardId 3 (or uses --mode polling for automatic pickup)
Design runs 4 steps: review related tickets -> create technical design -> review for cross-ticket conflicts -> estimate ticket size. Gate check validates output and may trigger optional specialist reviews. Card moves to Designed with estimate written to board field. If the card has a parent story, the story's estimate is recalculated as the sum of its children.
Operator reviews design. For user stories, approves by moving to Ready for Tasking. For tasks/bugs, approves by moving to Ready for Implementation.
(Stories only) Tasking decomposes the story into child tasks. Each task inherits the story's priority, gets a best-guess estimate, and is placed in Ready for Design. The story moves to Waiting for Tasks and completes automatically when all child tasks reach Done.
Implementation runs 2 steps: implement code (Sonnet) -> code review (Sonnet). Gate check validates output and may trigger optional specialist reviews. Card moves to Ready for Test.
Test runs 2 steps: QA agent validates implementation -> documentation agent updates memory bank if needed. Gate check validates output (including doc updates) and may trigger optional specialist reviews. Moves to Tested on success.
Operator approves by moving to Approved. System auto-merges the PR branch and moves to Done.

If the agent needs more information, the card moves to a Questions column with questions posted as a comment. The operator answers and moves the card back to re-trigger.

On error, the card moves to Error with details posted as a comment.

Agent Contract

Each agent returns structured JSON output:

{
  "outcome": "COMPLETE | NEEDS_INFO | ERROR",
  "detail": "GitHub-flavored markdown summary",
  "questions": [
    {
      "question": "What is the target component?",
      "recommendations": ["Auth module", "API gateway"]
    }
  ]
}

The orchestrator:

Posts the detail as a markdown comment on the issue (one comment per step).
Transitions the card to the next column based on outcome.
Agents do NOT move cards directly -- the orchestrator controls all transitions.

Workflow Configuration

File-based config (workflow.github.json) maps columns to roles and transitions:

{
  "states": {
    "Ready for Design": {
      "name": "Ready for Design",
      "gateType": "agent_run",
      "gitBehavior": "discard",
      "pipelineOrder": 1,
      "providerParams": { "effort": "max" },
      "steps": [
        { "name": "review_related_tickets", "role": "senior_engineer", "taskPromptFile": "prompts/states/steps/review_related_tickets.md" },
        { "name": "create_design", "role": "senior_engineer", "taskPromptFile": "prompts/states/ready_for_design.md" },
        { "name": "review_design_conflicts", "role": "senior_engineer", "taskPromptFile": "prompts/states/steps/review_design_conflicts.md" },
        { "name": "estimate_ticket", "role": "estimator", "taskPromptFile": "prompts/states/steps/estimate_ticket.md" }
      ],
      "gateCheck": {
        "role": "gate_checker",
        "taskPromptFile": "prompts/gates/post_design.md"
      },
      "transitions": {
        "IN_PROGRESS": "Designing",
        "COMPLETE": [
          { "type": "moveToColumn", "value": "Designed" },
          { "type": "setField", "field": "Estimate", "value": "{{estimation}}" }
        ],
        "NEEDS_INFO": "Design Questions",
        "ERROR": "Error",
        "GATE_FAIL": "Ready for Design"
      }
    }
  },
  "roles": {
    "senior_engineer": {
      "model": "claude-opus-4-6",
      "systemPromptFile": "prompts/senior_engineer.md",
      "sections": ["Technical Design", "Decisions", "Implementation"]
    }
  },
  "polling": {
    "priorityFieldName": "priority",
    "priorityOrder": ["P0", "P1", "P2"]
  },
  "estimation": {
    "calibrationTicketId": "34",
    "calibrationSize": 1,
    "fieldName": "Estimate",
    "scale": [1, 2, 4, 8]
  },
  "dependencyPolicy": {
    "enabled": false,
    "enforcedStates": ["Ready for Implementation", "Ready for Test", "Approved"],
    "satisfiedColumns": ["Done"],
    "commentOnBlocked": true
  },
  "cardTypes": {
    "story": { "name": "User Story", "labelPrefix": "type", "allowedChildren": ["task"] },
    "task": { "name": "Task", "allowedChildren": [] }
  },
  "cardTypeField": "Type"
}

steps array defines sequential agent invocations within a state (each with its own role and prompt)
gateCheck runs a lightweight agent after all steps complete to validate output
providerParams passes executor-specific flags (e.g., effort for Claude CLI)
gitBehavior: discard (design/test/tasking), commit_and_push (implementation)
taskPromptFile / systemPromptFile point to markdown files under prompts/
pipelineOrder determines polling priority (higher = picked first)
transitions values can be a string (column name) or an array of actions (moveToColumn, setField, updateParentSum)
estimation configures calibration-based ticket sizing (scale, calibration ticket, board field)
dependencyPolicy enables blocking dependencies. When enabled, polling skips blocked cards and direct agent/merge runs refuse them until blockers are in satisfiedColumns (or closed when the provider exposes only issue state)
cardTypes defines card type hierarchy for child task generation (e.g., stories → tasks). labelPrefix is optional — set it to apply {prefix}:{typeKey} labels, or omit/null it to opt that type out of labels entirely
cardTypeField (optional, workflow-level) — name of a project field (e.g. "Type") that holds each card's type. When set, generated children write their CardTypeDefinition.Name to this field, and parent-type lookups for allowedChildren enforcement read from this field first and fall back to labels. Labels and fields can be used together; each cardTypes[k] must have either a labelPrefix or a global cardTypeField
generationConfig on a step specifies child ticket creation: targetType, targetColumn, linkToParent, copyFields (fields to inherit from parent, e.g., priority), setFields (literal field→value map applied to the new card — useful for putting children into a specific pipeline stage, e.g. {"Type": "Task", "Activity": "Design"}; wins over copyFields on key collision)
updateParentSum transition action recalculates a parent card's field as the sum of its children's values (used for estimate rollup)

See docs/CardTypesAndGeneration.md for a walkthrough of label-based vs. field-based type discrimination, generationConfig.setFields, and dependency front matter on generated tickets.

Tech Stack

Layer	Technology
Board provider	GitHub Projects v2 (via `gh` CLI) or Trello (REST API)
Board abstraction	`ITaskBoardClient` interface
Orchestrator	C# / .NET 10
Agent executors	Six implementations behind `IAgentExecutor` — sandboxed by default (`docker-claude-cli`, `docker-codex`, `docker-opencode`, `docker-claude-qwen`) plus host CLI variants (`claude-cli`, `codex`) gated behind `--unsafe`. See Agent.md §7 for the full provider table, or per-sandbox how-tos: docs/DockerSandbox.md, docs/CodexSandbox.md, docs/OpenCodeSandbox.md, docs/ClaudeQwenSandbox.md
Multi-agent candidate evaluation	`CandidateExecutor` — opt-in per step. Runs N agents in parallel against the same task, an evaluator picks a winner, the winner's branch is promoted, and per-(role, provider) win-rate + quality-score + cost / token / structurer-fallback / evaluator-reliability / re-run-fast-path-hit metrics accumulate. See docs/CandidateEvaluation.md
Named-resource concurrency pool	`IResourcePool` / `ResourcePool` — opt-in via `ResourcePool` config section. Operators declare named resources (e.g. a single shared local llama.cpp server) with concurrency caps, and tag providers that need them. Wired around every executor invocation site so cross-provider candidates can't stomp on a shared backend. See Agent.md §10.3.
Agent sandbox image (Claude)	`docker/agent-sandbox/Dockerfile` — node:22-slim + Claude CLI + git + ripgrep; `aiboard-agent-sandbox:latest`
Agent sandbox image (Codex)	`docker/codex-sandbox/Dockerfile` — node:22-slim + Codex CLI + git + ripgrep; `aiboard-codex-sandbox:latest`
Git isolation	Git worktrees (`GitWorkspaceManager`)
Task files	`.aiboard/tasks/{id}.md` (ephemeral, gitignored)

For LLM coding agents

If you're an AI coding agent setting this system up for the first time, read Agent.md before anything else. It's a single-file guide written specifically for LLM consumption that covers the architecture, the JSON schemas for appsettings.json and workflow.*.json, every available agent executor and model with pros/cons + when-to-use guidance, the role catalog, multi-agent candidate evaluation, and a step-by-step setup flow for a new project. The file ships in the release distribution alongside aiboard.exe.

A Claude Code skill (skills/aiboard/SKILL.md) is bundled with the distribution. Install it once per machine with aiboard --install (copies every bundled skill into ~/.claude/skills/<name>/). After that, /aiboard in any Claude Code conversation routes you through the right aiboard --mode ... invocation — the skill detects your project state and picks the right mode. The --install flag can also be combined with another mode (e.g. aiboard --install --mode init) to install the skill and continue with the requested work. See skills/README.md for details.

For a narrative human-targeted walkthrough of the onboarding lifecycle (init → validation → scaffold-board → polling, plus diagnose for stuck cards), see docs/Onboarding.md.

Quick Start

Install (released binary — recommended for non-developers)

Each release on the Releases page ships two flavors per platform:

aiboard-{rid}.zip / .tar.gz — self-contained single-file (default). Bundles the .NET runtime; runs on any host without a separate install. ~33 MB compressed.
aiboard-{rid}-fdd.zip / .tar.gz — framework-dependent. Requires the .NET 10 runtime on PATH. ~5–10 MB compressed.

Where {rid} is one of win-x64, linux-x64, osx-x64, osx-arm64. If you don't know which to pick, use self-contained — it's the path most users want.

See QUICKSTART.md for what to do after extracting.

Prerequisites (when building from source)

.NET 10 SDK
Docker (for local PostgreSQL)
gh CLI authenticated with project + repo scopes
claude CLI installed and authenticated

Start the database

docker compose up -d

This launches PostgreSQL on localhost:5432, runs Flyway migrations automatically, and starts a Grafana instance on http://localhost:3000 (admin/admin) with a pre-provisioned metrics dashboard. The default connection string in appsettings.json connects to this local instance.

Build the agent sandbox image (optional)

Only needed if using Docker-based agent execution. See docs/DockerSandbox.md for the full enable-and-verify how-to.

.\scripts\build-sandbox.ps1

Or via docker compose:

docker compose --profile build up agent-sandbox

Build args: -BaseImage, -AgentUid, -AgentGid, -ClaudeCliVersion, -Tag, -NoCache.

To enable the sandbox at runtime, set AGENT_EXECUTOR=docker-claude-cli. It is off by default.

For dependency-heavy projects on Docker Desktop Windows, opt into Docker named-volume overlays for hot cache directories so test runners do not traverse thousands of small files through the host bind mount:

"DockerAgents": {
  "Claude": { "PerformanceVolumes": ["node_modules", ".pnpm-store"] }
}

PerformanceVolumes is available on all Docker agents. Use it only for reproducible dependency/cache paths, not source or commit-required build outputs.

Build the Codex sandbox image (recommended, for sandboxed Codex)

A separate sandbox wraps the OpenAI Codex CLI for sandboxed use against a real codebase. Filesystem isolation is provided by Docker, so the agent runs with --yolo by default — fast, autonomous, and contained. See docs/CodexSandbox.md for the full setup.

.\scripts\build-codex-sandbox.ps1

Auto-registered when Docker is detected. Provider key docker-codex. Migrate workflow roles from codex (host) to docker-codex to keep them usable without --unsafe.

Build the OpenCode sandbox image (optional, for local-LLM roles)

A separate sandbox wraps the OpenCode CLI for use with a local llama.cpp server (e.g. Qwen3.6 served by the sibling local-llm compose project). Use it when you want to route low-stakes roles (gate checks, estimation, simple reviews) to a free local model. See docs/OpenCodeSandbox.md for the full setup, including role-suitability guidance.

.\scripts\build-opencode-sandbox.ps1

Requires the llm-net Docker network (owned by the local-llm project) before any role routes to it. Enable at runtime with AGENT_EXECUTOR=docker-opencode.

Use the Claude CLI against the local Qwen server (optional, for schema-enforced local runs)

The third executor, docker-claude-qwen, runs the regular Claude CLI in the same aiboard-agent-sandbox image but redirected to the local llama.cpp proxy via ANTHROPIC_BASE_URL. Headline benefit: server-side --json-schema enforcement — the proxy translates the schema to a tool-call constraint that llama.cpp enforces during generation. Useful when output reliability matters (e.g. the candidate-evaluation evaluator). See docs/ClaudeQwenSandbox.md.

$env:AGENT_EXECUTOR = "docker-claude-qwen"

Add project-specific tooling to the sandbox (optional)

If your project's agent work needs a runtime that the upstream sandbox doesn't ship (a game engine, a JVM, a specific compiler, etc.), don't fork the upstream Dockerfile. Overlay it: a tiny FROM aiboard-X-sandbox:latest Dockerfile in your project repo, retag, and point DockerAgents:*:ImageName at the new tag. See docs/ProjectOverlays.md for the pattern, build-script template, and a worked Godot example.

No separate image build needed — reuses the Claude sandbox built above. Pair with docker-opencode in a candidate group to A/B them on real workloads.

Run an agent on a card

.\scripts\run_once.ps1 -CardId 3

Run in polling mode (automatic pickup)

.\scripts\run_polling.ps1

Graceful shutdown (polling and queue modes)

Press Ctrl+C once to request a graceful shutdown — the runner finishes the current card and exits cleanly. Press Ctrl+C twice to force quit immediately (may leave a card stuck in an in-progress column).

Manual invocation with env vars

$env:BOARD_PROVIDER = "github"
$env:AGENT_EXECUTOR = "claude-cli"
$env:WORKFLOW_CONFIG_PATH = "workflow.github.json"
$env:GitHubProjects__Owner = "YourGitHubUser"
$env:GitHubProjects__Repo = "YourUser/your-repo"
$env:GitHubProjects__ProjectNumber = "1"

# Single card
dotnet run --project lambda/src/TaskBoard.Worker -- --mode agent --card-id 3 --board-id 1 --workspace .

# Polling (auto-pickup highest priority card from "Ready for" columns)
dotnet run --project lambda/src/TaskBoard.Worker -- --mode polling --board-id 1 --workspace .

# Metrics report (all time)
dotnet run --project lambda/src/TaskBoard.Worker -- --mode metrics

# Metrics report (last 7 days)
dotnet run --project lambda/src/TaskBoard.Worker -- --mode metrics --since 7d

# Metrics report (single card)
dotnet run --project lambda/src/TaskBoard.Worker -- --mode metrics --card-id 3

Safety Model

Sandboxed by default. Only Docker-based providers (docker-claude-cli, docker-codex, docker-opencode, docker-claude-qwen) are usable out of the box. The agent runs inside a container with a read-only base .git and no host filesystem access beyond the worktree.
Host CLI agents require --unsafe. The claude-cli and codex providers run on the host with full credentials and filesystem access; workflows that reference them refuse to start unless you pass --unsafe (or set Unsafe: true in config). Migrate to docker-claude-cli / docker-codex to keep working without the flag.
Agents cannot transition state directly -- the orchestrator validates all transitions.
Manual approval gates block progression until a human moves the card.
Automated gate checks (Haiku) validate agent output before state transitions.
Agent execution happens in isolated git worktrees; the main repo is never modified.
System merge includes conflict detection with retry logic (max 3 attempts).
All agent output is posted as upserted comments per step (no spam).

Multi-Tenant Database

Multiple projects (different GitHub repos/projects, Trello boards) can share one Postgres instance without conflicting. Every per-tenant table (agent_run, step_result, card_state, processed_events) carries a tenant_id column as the leading PK; metrics views project it; stores filter on it.

The tenant identifier is {provider}:{identifier} — for example github:owner/repo/4 or trello:abc123. It's resolved from the merged configuration at startup (GitHubProjects:Owner/Repo/ProjectNumber or Trello:BoardId); missing required values fail the startup with a diagnostic naming the offending key. Each worker process is single-tenant — to run two projects against one DB, run two processes with different .aiboard/ configs.

Docker container names also embed an 8-char tenant hash (aiboard-{hash}-{cardId}-…) so concurrent runs across tenants with the same numeric card ID never collide.

PGMQ queues are not yet tenant-scoped (queue mode is legacy/secondary).

Cost Strategy

Idle cost: $0. Costs scale only when an agent executes (LLM tokens are the primary cost driver).

Future Expansion

Webhook-triggered automation (currently manual CLI or polling)
PR creation automation
SLA timers / retry policies
Visual dashboard

Non-Goals (v1)

Replacing the board UI
Full CMS for workflow editing
Autonomous production deployment
Complex RBAC
Multi-repo orchestration

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
.claude		.claude
.github		.github
.vs/task-board/config		.vs/task-board/config
db		db
dist		dist
docker		docker
docs		docs
grafana		grafana
lambda		lambda
memory-bank		memory-bank
prompts		prompts
scripts		scripts
skills		skills
templates		templates
tests/fixtures/trello		tests/fixtures/trello
worker		worker
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Agent.md		Agent.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
Readme.md		Readme.md
SECURITY.md		SECURITY.md
appsettings.user.example.json		appsettings.user.example.json
docker-compose.yml		docker-compose.yml
task-board.sln		task-board.sln
workflow.github.example.json		workflow.github.example.json
workflow.github.json		workflow.github.json
workflow.simple.example.json		workflow.simple.example.json
workflow.story-decomposition.example.json		workflow.story-decomposition.example.json
workflow.v1.json		workflow.v1.json

Folders and files

Latest commit

History

Repository files navigation

AI Kanban Agent Orchestrator

Vision

Core Principles

Architecture

Board Structure (GitHub Projects)

Roles

How It Works

Agent Contract

Workflow Configuration

Tech Stack

For LLM coding agents

Quick Start

Install (released binary — recommended for non-developers)

Prerequisites (when building from source)

Start the database

Build the agent sandbox image (optional)

Build the Codex sandbox image (recommended, for sandboxed Codex)

Build the OpenCode sandbox image (optional, for local-LLM roles)

Use the Claude CLI against the local Qwen server (optional, for schema-enforced local runs)

Add project-specific tooling to the sandbox (optional)

Run an agent on a card

Run in polling mode (automatic pickup)

Graceful shutdown (polling and queue modes)

Manual invocation with env vars

Safety Model

Multi-Tenant Database

Cost Strategy

Future Expansion

Non-Goals (v1)

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages