Skip to content

daemon: batched dequeue per tick + SIGHUP config hot-reload#13

Open
mrap wants to merge 1 commit into
pipeline-v2-S2271from
boi/SD979-daemon-batch-hotreload
Open

daemon: batched dequeue per tick + SIGHUP config hot-reload#13
mrap wants to merge 1 commit into
pipeline-v2-S2271from
boi/SD979-daemon-batch-hotreload

Conversation

@mrap
Copy link
Copy Markdown
Owner

@mrap mrap commented Apr 29, 2026

Motivation

Problem 1 — 40s ramp time

The daemon loop dequeued one spec per tick with a ~5s sleep. Starting from 0 workers with max_workers=8 took ~40s before all slots were filled. During that window specs sat idle in the queue.

Problem 2 — restart orphans in-flight workers

Bumping max_workers required boi daemon restart, which kills every running worker mid-task. This was the primary disruption path for live tuning.

What changed

spawns_per_tick (T2AD4 + T47CE)

  • New spawns_per_tick: Option<u32> field in Config (default 4).
  • Daemon loop now computes to_spawn = cap_remaining.min(spawns_per_tick) per tick and drains up to that many queue entries.
  • Micro-jitter: 50–150ms randomized sleep between successive spawns within a tick smooths the Anthropic API cold-start burst.
  • Each spawned worker logs its batch slot (batch slot 1/4) for easy telemetry correlation.
  • boi config set spawns_per_tick N now works.

SIGHUP hot-reload (T7AFE)

  • signal_hook::flag::register(SIGHUP, reload_flag) in the daemon.
  • Before each tick: if flag set, config::try_load() → update wc.max_workers, wc.spawns_per_tick, wc.claude_bin. Parse failure = no-op + loud log.
  • Only those three fields are live. Everything else stays frozen at startup. In-flight workers keep their original WorkerConfig.
  • New boi daemon reload subcommand sends SIGHUP to ~/.boi/pids/daemon.pid. Typical path: boi config set max_workers 10 && boi daemon reload.

Mitigations

Risk Mitigation
Anthropic API 429s from burst spawns 50–150ms per-spawn jitter; cap via spawns_per_tick
git worktree add index contention Natural serialization: each spawn is sequential within the tick loop
Failure amplification Each spawn's error breaks that slot only; loop continues
Bad SIGHUP config nuking live settings try_load returns Err on parse failure; old values kept

Tests

cargo test --lib daemon_batch      # 8 tests — all passing
cargo test --lib daemon_hotreload  # 6 tests — all passing

Covered scenarios:

  • Empty queue → 0 spawns
  • 1 eligible, cap=4, tick=4 → 1 spawn
  • 6 eligible, cap=4, tick=4 → 4 this tick (2 next tick)
  • 6 eligible, cap=8, tick=4 → 4 this tick
  • 4 eligible, cap=2, tick=4 → 2 this tick
  • apply_reload updates only hot fields
  • Bad config on SIGHUP → original wc retained, error logged
  • Missing config on SIGHUP → defaults returned (no crash)

Cold-start behavior (smoke run)

Not yet run against live Anthropic API — no smoke data yet. Once this merges to main and a batch of 6+ specs are queued, observe batch slot logs and API 429 rate.

Docs

  • docs/daemon.md: tick cadence, spawns_per_tick semantics, SIGHUP reload, what does/doesn't reload.
  • README.md: boi daemon reload added to command list.
  • SKILL.md: boi config set max_workers N && boi daemon reload is now the live-bump path.

Don't merge — Mike reviews.

🤖 Generated with Claude Code

- Add spawns_per_tick config field (default 4) to cap workers spawned per tick
- Rewrite daemon dequeue loop: drain up to spawns_per_tick per tick with 50-150ms jitter
- SIGHUP handler: live-reload max_workers/spawns_per_tick/claude_bin without restart
- Add `boi daemon reload` subcommand (sends SIGHUP to daemon.pid)
- Add try_load()/try_load_from() for fallible config parsing (bad config = no-op)
- docs/daemon.md: tick cadence, spawns_per_tick semantics, hot-reload behavior
- 14 new tests: daemon_batch (8) + daemon_hotreload (6), all passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant