feat: DAG reassessment via boi plan + dispatch-many#14
Open
mrap wants to merge 3 commits into
Open
Conversation
- Add deterministic runtime (builtin:commit, builtin:merge, builtin:cleanup) that skips Claude spawn entirely — cold-start win for post-task phases - Add spec-critique ↔ spec-improve loop (separate Claude sessions, max 3 rounds) replacing the old spec-review phase - Add commit/merge/cleanup phase TOMLs wired to deterministic builtins - Wire mode.v2 in pipelines.toml with spec_pre_phases / spec_post_phases - Add end-to-end v2 smoke test (tests/v2_smoke.rs) - Update README, SKILL.md, and docs/pipelines.md for v2 mode v1 modes (default, challenge, discover, generate) are untouched. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add spawns_per_tick config field (default 4) to cap workers spawned per tick - Rewrite daemon dequeue loop: drain up to spawns_per_tick per tick with 50-150ms jitter - SIGHUP handler: live-reload max_workers/spawns_per_tick/claude_bin without restart - Add `boi daemon reload` subcommand (sends SIGHUP to daemon.pid) - Add try_load()/try_load_from() for fallible config parsing (bad config = no-op) - docs/daemon.md: tick cadence, spawns_per_tick semantics, hot-reload behavior - 14 new tests: daemon_batch (8) + daemon_hotreload (6), all passing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds automatic dependency analysis to boi dispatch flow so ordering mistakes are caught mechanically before a spec queue goes wrong. Core additions: - src/cli/plan.rs: DAG model (collect_artifacts, detect_implicit_deps, topological sort, cycle detection) + LLM critique pass via claude-haiku. Critique is cached by hash(topology + titles) to avoid re-spending tokens. - src/cli/dispatch_many.rs: boi dispatch-many — accepts N specs, runs plan first, gates dispatch on critique result (block=refuse, warn=prompt, clean=auto-approve). Dispatches in topological order with --after chain. - src/cli/dispatch.rs: lightweight implicit-dep WARN on single dispatch when no --after is provided. Full LLM pass reserved for plan/dispatch-many. - src/runtime/openrouter.rs: HTTP client for OpenRouter (haiku critique calls). - src/failure.rs: structured failure-reason capture for runner diagnostics. - src/cli/why.rs: boi why <spec-id> — explain last failure from DB. Documentation: - README.md: boi plan + dispatch-many sections with examples - docs/dag-reassess.md: model explanation + command selection guide - SKILL.md: updated CLI table Tests: dag_build (empty/single/chain/fan-out/diamond/cycle/implicit-dep), dispatch_many (right --after chain, cycle refusal, --force behaviour), dispatch_dag_warn (warn-on-implicit-dep for single dispatch). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Multi-spec BOI tracks had to be re-chained manually via
--afterflags atdispatch time. In this session's 5-spec track, ordering had to be verified by
hand — the wrong order only surfaces when a dependent task fails mid-run
because upstream work wasn't ready yet.
Before: whoever dispatches must remember in-flight deps + express them
via
--after. Easy to mis-order. In-flight specs can expand scope after adependent is queued, silently breaking the assumed contract.
After:
boi dispatch-manyruns a DAG analysis + LLM critique beforeany dispatch happens. Wrong orderings are flagged with suggested fixes.
boi dispatchgets a lightweight implicit-dep WARN for free.Also: typed failure reasons + inline display in
boi status(closes SO S12 gap)Problem: Audit (2026-04-29) found that 8/8 recent failed specs had
spec.error = NULL.boi statusonly showed✗ S1234 Title 2m ago—nothing about why. To find out, users had to dig through daemon logs.
This broke SO S12 ("loud failures") at the user surface.
Fix: Structured
FailureReasonenum + persistence + rendering.Typed enum (
src/failure.rs)Helpers:
short_summary()(one-line for status),detail()(multi-line forboi why).Serialized as JSON to
spec.error/task.errorcolumns. Falls back toOther { message }for legacy NULL/string values.Failure capture (
src/queue.rs,src/worker.rs,src/runner.rs)New helpers
queue.fail_spec(id, FailureReason)andqueue.fail_task(spec_id, task_id, FailureReason)write the JSON reason to the DB and emit structured telemetry events (
boi.spec.failed/boi.task.failed).Every failure path now maps to a typed reason:
RuntimeError::Timeout→Timeout { ... }NonZeroExit("HTTP 429")→ProviderRateLimit { ... }NonZeroExit("HTTP 4xx/5xx")→ProviderHttp { ... }ProviderAuth { ... }ModelResolution { ... }VerifyFailed { ... }WorkerCrash { ... }Status rendering (
src/cli/status.rs)Failed specs now render a second indented line with the error reason:
Long summaries are truncated with
…. Pass--verbose/-vfor multi-linedetail().boi why <spec-id>(src/cli/why.rs)Fast forensics: prints the full
FailureReason::detail()for any spec.Tests
failure_reason: all variants roundtrip, truncation, invalid-JSON fallbackfailure_capture: every failure path produces typed reason; NO NULL errorsstatus_render_error: no-error → no second line; typed error → short summary; long → ellipsis; verbose → detailImplementation
New commands
boi plan [specs...]— loads in-flight + queued specs from DB, buildsa DAG, detects cycles, topologically sorts it, then runs an LLM critique
(claude-haiku) that flags specs that should depend on each other but don't,
wrongly-serial specs that could run in parallel, and scope contradictions.
Critique is cached by
hash(DAG topology + spec titles)to avoidre-spending tokens on identical state.
boi dispatch-many <spec1> <spec2> ...— runsplanfirst, then:block-severity concerns → refuse + print concerns, exit non-zerowarn-severity → print proposed order + prompt (or--yes/--force)clean→ auto-dispatch in topological order with correct--afterchainboi why <spec-id>— explain the last failure for a spec from DB(uses the new
failure.rsstructured failure capture).Lightweight single-dispatch gate
boi dispatch <spec>now runs deterministic implicit-dep detection (no LLM)when in-flight specs exist. If artifact overlap is detected and no
--afterwas provided: WARN + suggest
--after. Add--skip-planto bypass.Core modules
src/cli/plan.rscollect_artifacts,detect_implicit_deps, topo-sort, cycle detection + LLM critique passsrc/cli/dispatch_many.rssrc/cli/dispatch.rssrc/runtime/openrouter.rssrc/failure.rssrc/cli/why.rsboi whycommandsrc/cli/status.rsTest coverage
dag_build: empty queue, single spec, two-spec chain, fan-out, diamond,cycle detection (errors loud), implicit-dep detection
dispatch_many: correct--afterchain for 3-spec implicit chain; cyclein declared deps → refusal;
--forceoverrides warn but not blockdispatch_dag_warn: WARN emitted when artifact overlap with no--afterfailure_reason: roundtrip, truncation, legacy fallbackfailure_capture: no NULL errors on any failure pathstatus_render_error: rendering correctness + verbose modeExample: before vs after (DAG)
Before (manual, fragile):
After (automatic):
Do not merge — Mike reviews.
🤖 Generated with Claude Code