Execution plan for closing the gap between the current Agentboard kit and a credible multi-provider AI-workflow layer. Each phase is self-contained: it has a goal, a scope, a concrete task list, success criteria, and an effort estimate. Ship phases in order — each later phase assumes the earlier ones are in place.
Hard constraints inherited from CLAUDE.md (do not violate):
- No runtime dependencies (pure bash, git, sed, grep, awk only)
- No stack pre-picking in
init - Templates that ship verbatim must stay stack-agnostic
- Max ~300 lines per file
- Placeholders use
{{UPPERCASE_SNAKE}}
Any phase that would break one of these needs an explicit decision log entry first (see "Decisions that must precede work" at the bottom).
Goal: stop overselling "manages context across providers." Agentboard does not move conversation state between providers; it ships a shared project-truth format plus a read-order convention that all three CLIs respect. Own that honestly.
Scope: docs and CLI copy only. No code changes.
Tasks:
- Rewrite the README lead: "Shared work-state and project-truth for multi-provider AI workflows."
- Update
bin/agentboardhelp banner and--helptext to match. - Add a "How it actually works" section to README explaining the file-first
model: each provider's CLI auto-loads its entry file, all three entry files
point at the same
.platform/pack. - Audit
CHEATSHEET.mdfor the same framing.
Success criteria: a first-time reader of README can explain, in one sentence, what Agentboard does and does not do.
Effort: ~1 engineer-day.
Goal: make the existing surface trustworthy. Catch malformed state early, stop silent corruption on re-activation, replace hand-typed progress logs with real git-derived evidence.
Scope: bin/agentboard, lib/agentboard, templates/platform/ACTIVATE.md,
templates/platform/work/, templates/platform/domains/.
Tasks:
- Frontmatter schema check (pure bash, zero deps):
- Add
lib/agentboard/schema.shwith a tiny validator that reads required keys + allowed enums per file type (stream, domain, repo) using awk/grep. - Wire into
doctor,new-stream,new-domain. - Schemas live as plain text tables in
lib/agentboard/schemas/— no YAML parser dependency.
- Add
- Activation idempotency:
- Tag every section Agentboard generates with an HTML comment marker:
<!-- agentboard:section=decisions v=1 -->and closing marker. - Re-running activation replaces tagged sections in place instead of appending. User-authored content between markers is preserved.
- Add a
--dry-runflag that prints the diff without writing.
- Tag every section Agentboard generates with an HTML comment marker:
- Git-derived progress log:
- New subcommand
agentboard log <stream-slug>that runsgit diff --stat <base>..HEADscoped to stream files plus code touched on the stream branch, and appends a timestamped block to the stream's progress log section. - Retain the manual note field, but stop relying on the LLM to self-report edits.
- New subcommand
- Post-activation doctor gate:
- ACTIVATE.md's final step runs
agentboard doctorautomatically. Bad frontmatter fails activation loudly instead of surviving silently.
- ACTIVATE.md's final step runs
Success criteria:
doctorrejects a stream file missing required frontmatter keys.- Running activation twice on the same project produces a clean diff with no duplicated sections.
agentboard log <slug>output ends up in the stream file verbatim.
Effort: ~5 engineer-days.
Goal: give the human operator a live view of ACTIVE.md without opening
the file. Highest user-visible credibility gain per hour spent in the whole
roadmap.
Scope: one new subcommand, zero changes to the data model.
Tasks:
agentboard tui— read-only terminal dashboard that rendersACTIVE.mdas a sortable table (stream | type | status | owner | updated | branch).- Filter flags:
--status in-progress,--owner <name>,--repo <slug>. - Color coding from ANSI escapes only (no
gum, no external deps). - Auto-refresh every N seconds via plain
whileloop +clear.
Success criteria:
- On a project with 10+ streams,
agentboard tui --status in-progressshows only live work in under 200ms. - Works on stock macOS bash 3.2.
Effort: ~3 engineer-days.
Goal: close the honesty gap where Claude Code has the closure-gate hook and Codex/Gemini do not. Either reach parity or state plainly in the README that enforcement is Claude-only. Do not leave the current ambiguous state.
Scope: hooks or equivalent pre-tool interception in each CLI.
Tasks:
- Research phase (1 day): map Codex CLI and Gemini CLI extension/hook surfaces. Document in a decision log entry what each CLI supports.
- Codex parity: if Codex CLI exposes an MCP or pre-tool hook, port the
closure-gate logic from
platform-closure-gate.jsto the Codex surface. - Gemini parity: same for Gemini CLI's extension mechanism.
- Fallback path: if a CLI has no hook surface, ship a pre-commit git hook that enforces the same invariant at commit time. Less tight than runtime enforcement but better than nothing.
- Update README to state current enforcement coverage per provider honestly.
Success criteria:
- Attempting to set
closure_approved: trueon a stream that still has open done-criteria items is blocked in all three CLIs — or the README states which providers lack the gate and why.
Effort: ~5 engineer-days, mostly research and surface discovery.
Goal: today's handoff prints a fixed load order. It does not know that
Codex's context is smaller than Claude's. Give handoff a token budget so it
can tailor the pack per provider.
Scope: bin/agentboard handoff, stream/domain templates, new optional
summary files.
Tasks:
- Tier 1 — budget flag:
agentboard handoff <slug> --budget 8kincludes only the brief + stream file + primary domain, skipping secondary domains and repo refs when the estimated token count exceeds the budget. - Token estimation: bytes-to-tokens heuristic (1 token ≈ 4 bytes of English markdown) — good enough, zero deps.
- Tier 2 — per-stream summary file:
agentboard summarize <slug>writes<slug>.summary.mdnext to the stream file. Deterministic prompt shipped intemplates/platform/agents/summarize.md, executed by the active LLM in the session. Committed to git so summaries are auditable. - When
--budgetpressure is high,handoffprefers the summary over the full stream file. - Tier 3 (adaptive, per-provider tokenizer) is explicitly deferred — see Phase 6.
Success criteria:
agentboard handoff <slug> --budget 4kproduces a load order whose total byte size stays under the byte-equivalent of the budget.- A summary file, once committed, is preferred over the raw stream file when the budget is tight.
Effort: ~5 engineer-days.
Goal: today two agents editing the same stream file produce a silent merge conflict. Ship a concurrency primitive that Agentboard enforces, not just advises.
Scope: claim, release, doctor, stream file frontmatter.
Tasks:
- Decision first: choose between
flock(1)advisory-locks and the git-branch-per-stream model. Recommendation: git-branch-per-stream, because it matches the file-first philosophy and survives Dropbox/NFS vaults. - Enforce: editing
work/<slug>.mdrequires being on branchstream/<slug>(or a designated override branch).doctorfails the check if violated. agentboard claim <slug>creates/switches to the stream branch.agentboard release <slug>returns to the main/default branch and marks the stream as unclaimed inACTIVE.md.- Cross-agent signaling:
ACTIVE.mdowner column becomes the authoritative claim record. Multiple concurrent claims are caught bydoctorand by the Phase 3 pre-tool hook.
Success criteria:
- Two agents cannot both have
claimopen on the same stream withoutdoctorflagging it. - A stream edit on the wrong branch is blocked before write.
Effort: ~10 engineer-days including tests.
Goal: Tier 3 from Phase 4 — per-provider tailored handoff packs using real tokenizer math instead of a byte heuristic. This is the one genuinely hard problem on the roadmap. Do not start until Phases 0–5 are shipped and users are asking for it.
Scope: new per-provider token counters, cached token measurements per
file, handoff --provider claude|codex|gemini.
Open questions (resolve before starting):
- Where do tokenizer implementations live without violating the
no-runtime-deps rule? Candidates: ship as optional binaries users install
themselves; emit rough counts via a single shelled-out
python -conly when the user opts in. - How is the cache invalidated? Per-file mtime vs. content hash.
Effort: ~20+ engineer-days. Treat as a funded sprint, not an evening.
Tests: every phase ships with unit tests under tests/unit/ and extends
tests/integration.sh where the new surface is user-visible. No phase
merges without green tests.
Docs: README lead, CHEATSHEET, and the template that init drops at the
project root all stay in sync. A phase that changes user-visible behavior
must ship its own docs patch in the same change.
Backward compatibility: existing .platform/ folders from prior
Agentboard versions must keep working. Phase 1's schema check should warn
on legacy files, not fail. A migrate command already exists — extend it,
don't replace it.
Explicitly out of scope:
- A web UI. The "Phase 3 UI" mentioned in older design notes is not on this roadmap. A TUI (Phase 2) covers the operator case; a web UI is a separate product decision.
- Auto-ingestion from code (parsing AST, inferring domains from imports). Activation is LLM-driven on purpose — auto-ingestion fights that design.
- Cloud sync / hosted state. Agentboard stays local-first and git-as-transport.
These are prerequisites, not tasks. Each one needs a short decision-log
entry (in whatever decision log the repo adopts — decisions.md at the
root is fine) before the corresponding phase starts:
- Phase 1: schema format — shipped as bash-readable tables vs. a YAML file parsed by awk. Recommendation: bash tables for zero-dep purity.
- Phase 3: per-CLI enforcement strategy — runtime hook vs. git pre-commit fallback.
- Phase 5: concurrency primitive —
flock(1)vs. git-branch-per-stream. - Phase 6: tokenizer delivery — whether to relax the no-runtime-deps rule and how.
If forced to ship only the next two weeks of work, ship Phase 0 + Phase 1 + Phase 2 + Phase 4 Tier 1. That covers ~80% of the credibility gap with zero genuinely hard problems. Phases 3, 5, and 6 address real weaknesses but require research and decisions before code.