Skip to content

feat: DAG reassessment via boi plan + dispatch-many#14

Open
mrap wants to merge 3 commits into
mainfrom
feat/dag-reassess
Open

feat: DAG reassessment via boi plan + dispatch-many#14
mrap wants to merge 3 commits into
mainfrom
feat/dag-reassess

Conversation

@mrap
Copy link
Copy Markdown
Owner

@mrap mrap commented Apr 29, 2026

Motivation

Multi-spec BOI tracks had to be re-chained manually via --after flags at
dispatch time. In this session's 5-spec track, ordering had to be verified by
hand — the wrong order only surfaces when a dependent task fails mid-run
because upstream work wasn't ready yet.

Before: whoever dispatches must remember in-flight deps + express them
via --after. Easy to mis-order. In-flight specs can expand scope after a
dependent is queued, silently breaking the assumed contract.

After: boi dispatch-many runs a DAG analysis + LLM critique before
any dispatch happens. Wrong orderings are flagged with suggested fixes.
boi dispatch gets a lightweight implicit-dep WARN for free.


Also: typed failure reasons + inline display in boi status (closes SO S12 gap)

Problem: Audit (2026-04-29) found that 8/8 recent failed specs had
spec.error = NULL. boi status only showed ✗ S1234 Title 2m ago
nothing about why. To find out, users had to dig through daemon logs.
This broke SO S12 ("loud failures") at the user surface.

Fix: Structured FailureReason enum + persistence + rendering.

Typed enum (src/failure.rs)

pub enum FailureReason {
    ModelResolution { model, provider },
    ProviderRateLimit { provider, retry_after_s },
    ProviderHttp { provider, status, body_excerpt },
    ProviderAuth { provider, env_var },
    Timeout { phase, secs },
    ToolError { phase, message },
    VerifyFailed { task, exit_code, stderr_excerpt },
    WorkerCrash { phase, signal, message },
    Other { message },
}

Helpers: short_summary() (one-line for status), detail() (multi-line for boi why).
Serialized as JSON to spec.error / task.error columns. Falls back to Other { message } for legacy NULL/string values.

Failure capture (src/queue.rs, src/worker.rs, src/runner.rs)

New helpers queue.fail_spec(id, FailureReason) and queue.fail_task(spec_id, task_id, FailureReason)
write the JSON reason to the DB and emit structured telemetry events (boi.spec.failed / boi.task.failed).

Every failure path now maps to a typed reason:

  • RuntimeError::TimeoutTimeout { ... }
  • NonZeroExit("HTTP 429")ProviderRateLimit { ... }
  • NonZeroExit("HTTP 4xx/5xx")ProviderHttp { ... }
  • Missing API key → ProviderAuth { ... }
  • Model not found → ModelResolution { ... }
  • Verify command failure → VerifyFailed { ... }
  • Subprocess SIGKILL/SIGSEGV → WorkerCrash { ... }

Status rendering (src/cli/status.rs)

Failed specs now render a second indented line with the error reason:

✗ SA015  My spec title            2m ago
    └─ Verify failed: exit 1 (stderr: assertion failed at line 42)

Long summaries are truncated with . Pass --verbose / -v for multi-line detail().

boi why <spec-id> (src/cli/why.rs)

Fast forensics: prints the full FailureReason::detail() for any spec.

boi why SA015

Tests

  • failure_reason: all variants roundtrip, truncation, invalid-JSON fallback
  • failure_capture: every failure path produces typed reason; NO NULL errors
  • status_render_error: no-error → no second line; typed error → short summary; long → ellipsis; verbose → detail

Implementation

New commands

  • boi plan [specs...] — loads in-flight + queued specs from DB, builds
    a DAG, detects cycles, topologically sorts it, then runs an LLM critique
    (claude-haiku) that flags specs that should depend on each other but don't,
    wrongly-serial specs that could run in parallel, and scope contradictions.
    Critique is cached by hash(DAG topology + spec titles) to avoid
    re-spending tokens on identical state.

  • boi dispatch-many <spec1> <spec2> ... — runs plan first, then:

    • block-severity concerns → refuse + print concerns, exit non-zero
    • warn-severity → print proposed order + prompt (or --yes/--force)
    • clean → auto-dispatch in topological order with correct --after chain
  • boi why <spec-id> — explain the last failure for a spec from DB
    (uses the new failure.rs structured failure capture).

Lightweight single-dispatch gate

boi dispatch <spec> now runs deterministic implicit-dep detection (no LLM)
when in-flight specs exist. If artifact overlap is detected and no --after
was provided: WARN + suggest --after. Add --skip-plan to bypass.

Core modules

File What it does
src/cli/plan.rs DAG model: collect_artifacts, detect_implicit_deps, topo-sort, cycle detection + LLM critique pass
src/cli/dispatch_many.rs Gated multi-spec dispatch
src/cli/dispatch.rs Lightweight implicit-dep WARN added to existing command
src/runtime/openrouter.rs HTTP client for OpenRouter (haiku critique calls)
src/failure.rs Structured failure-reason capture + FailureReason enum
src/cli/why.rs boi why command
src/cli/status.rs Error rendering under failed specs

Test coverage

  • dag_build: empty queue, single spec, two-spec chain, fan-out, diamond,
    cycle detection (errors loud), implicit-dep detection
  • dispatch_many: correct --after chain for 3-spec implicit chain; cycle
    in declared deps → refusal; --force overrides warn but not block
  • dispatch_dag_warn: WARN emitted when artifact overlap with no --after
  • failure_reason: roundtrip, truncation, legacy fallback
  • failure_capture: no NULL errors on any failure path
  • status_render_error: rendering correctness + verbose mode

Example: before vs after (DAG)

Before (manual, fragile):

boi dispatch specs/a.yaml
boi dispatch --after=A specs/b.yaml   # must remember A writes the file B reads
boi dispatch --after=B specs/c.yaml   # must remember order

After (automatic):

boi dispatch-many specs/a.yaml specs/b.yaml specs/c.yaml
# → plan detects b depends on a, c depends on b
# → dispatches in order A → B (--after=A) → C (--after=B)

Do not merge — Mike reviews.

🤖 Generated with Claude Code

mrap and others added 3 commits April 29, 2026 16:54
- Add deterministic runtime (builtin:commit, builtin:merge, builtin:cleanup)
  that skips Claude spawn entirely — cold-start win for post-task phases
- Add spec-critique ↔ spec-improve loop (separate Claude sessions, max 3 rounds)
  replacing the old spec-review phase
- Add commit/merge/cleanup phase TOMLs wired to deterministic builtins
- Wire mode.v2 in pipelines.toml with spec_pre_phases / spec_post_phases
- Add end-to-end v2 smoke test (tests/v2_smoke.rs)
- Update README, SKILL.md, and docs/pipelines.md for v2 mode

v1 modes (default, challenge, discover, generate) are untouched.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add spawns_per_tick config field (default 4) to cap workers spawned per tick
- Rewrite daemon dequeue loop: drain up to spawns_per_tick per tick with 50-150ms jitter
- SIGHUP handler: live-reload max_workers/spawns_per_tick/claude_bin without restart
- Add `boi daemon reload` subcommand (sends SIGHUP to daemon.pid)
- Add try_load()/try_load_from() for fallible config parsing (bad config = no-op)
- docs/daemon.md: tick cadence, spawns_per_tick semantics, hot-reload behavior
- 14 new tests: daemon_batch (8) + daemon_hotreload (6), all passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds automatic dependency analysis to boi dispatch flow so ordering
mistakes are caught mechanically before a spec queue goes wrong.

Core additions:
- src/cli/plan.rs: DAG model (collect_artifacts, detect_implicit_deps,
  topological sort, cycle detection) + LLM critique pass via claude-haiku.
  Critique is cached by hash(topology + titles) to avoid re-spending tokens.
- src/cli/dispatch_many.rs: boi dispatch-many — accepts N specs, runs plan
  first, gates dispatch on critique result (block=refuse, warn=prompt,
  clean=auto-approve). Dispatches in topological order with --after chain.
- src/cli/dispatch.rs: lightweight implicit-dep WARN on single dispatch when
  no --after is provided. Full LLM pass reserved for plan/dispatch-many.
- src/runtime/openrouter.rs: HTTP client for OpenRouter (haiku critique calls).
- src/failure.rs: structured failure-reason capture for runner diagnostics.
- src/cli/why.rs: boi why <spec-id> — explain last failure from DB.

Documentation:
- README.md: boi plan + dispatch-many sections with examples
- docs/dag-reassess.md: model explanation + command selection guide
- SKILL.md: updated CLI table

Tests: dag_build (empty/single/chain/fan-out/diamond/cycle/implicit-dep),
dispatch_many (right --after chain, cycle refusal, --force behaviour),
dispatch_dag_warn (warn-on-implicit-dep for single dispatch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant