Skip to content

fix(rc20): #524 outbox + #526 watchdog approval-pending follow-ups#556

Merged
Nathan Schram (nathanschram) merged 1 commit into
devfrom
feature/rc20-issue-followups
May 19, 2026
Merged

fix(rc20): #524 outbox + #526 watchdog approval-pending follow-ups#556
Nathan Schram (nathanschram) merged 1 commit into
devfrom
feature/rc20-issue-followups

Conversation

@nathanschram
Copy link
Copy Markdown
Member

Summary

Both #524 and #526 shipped fixes in 0.35.3rc19 (PR #555) but /monitor audits on 2026-05-18 showed each regression still firing in production because the rc19 patch landed in only one of two code paths. This PR closes the gaps for 0.35.3rc20.

#524 — outbox silently drops directory entries

  • rc19 surfaced the 📎 Outbox skipped notice on the normal-completion path but missed:
    • The pre-auto-continue delivery (stuck-after-tool-result recovery — subprocess 1's outbox scan)
    • The run_ok=False failed-run branch (entire outbox delivery skipped, so the user never learns about left-behind directories)
  • Extracted _surface_outbox_skipped helper in runner_bridge.py and wired it into both gap paths. Failed runs now do a cheap scan_outbox() (no file send) to collect skipped items and surface them.

#526 — approval-pending stalls misclassified

  • rc19 fixed only the bridge-side detector (progress_edits.stall_detected). The watchdog-side subprocess.liveness_stall in runner.py was unchanged, so untether-issue-watcher kept auto-filing GitHub issues on routine approval-pending sessions.
  • nsd audit (2026-05-18) showed a user cancel a productive 15-min session because the chat-side reassurance came too late (1800s threshold).
  • Added _recent_event_is_control_request(stream) helper. The watchdog now demotes WARN → INFO when the last JSONL event is control_request, and skips the auto-kill branch entirely.
  • Split _STALL_THRESHOLD_APPROVAL into _STALL_THRESHOLD_APPROVAL_FIRST = 600.0 and the existing 1800s refire — first reassurance now fires at 10 min, subsequent reminders gated by 30-min refire.
  • Reworded chat copy to make the affordance explicit: "⏳ Awaiting your approval (N min) — tap a button above to proceed (no action needed otherwise)".

Shared constant

  • Moved _APPROVAL_PENDING_REFIRE_S = 1800.0 to runner.py and imported into runner_bridge.py so the watchdog and bridge agree on a single 30-min cadence.

Test plan

  • Unit tests: `uv run pytest tests/test_outbox_delivery.py tests/test_exec_runner.py tests/test_exec_bridge.py --no-cov` — 263 passed, 1 skipped (symlinks on tmpfs)
  • Full suite: `uv run pytest --no-cov` — 2747 passed, 2 skipped
  • Lint: `uv run ruff check src/` — all checks passed
  • Format: `uv run ruff format src/ tests/` — applied
  • Lockfile: `uv lock` — synced (version-only change)
  • Dev-bot integration tests (Tier 1 Claude + Tier 7 commands) — to be run via `scripts/run-integration-tests.sh 0.35.3rc20 --manual` before fleet rollout per release-discipline.md
  • Post-merge daemon monitoring — tail `journalctl --user -u untether-issue-watcher` for 24h to confirm `subprocess.liveness_stall` WARN suppression in approval-pending sessions

Related

🤖 Generated with Claude Code

Both issues shipped rc19 fixes (PR #555) but /monitor audits on
2026-05-18 showed each regression still firing in production because
the rc19 patch landed in only one of two code paths.

#524 — outbox silently drops directory entries

rc19 surfaced 📎 Outbox skipped notices on the normal-completion path
in handle_message but missed two adjacent paths: the pre-auto-continue
delivery (subprocess 1 stuck-after-tool-result recovery) and the
run_ok=False failed-run branch. Both still silently dropped the agent's
intended deliverable.

This commit extracts the surfacing logic into _surface_outbox_skipped
in runner_bridge.py and wires it into both gap paths. On a failed run
the code still skips the actual file send (preserving the original
gating) but does a cheap scan_outbox() to collect skipped items and
surface them, so the user always learns what the agent intended to
ship. Honours the existing outbox_notify_skipped config flag and
filters the "..." overflow pseudo-entry from the user-facing block.

#526 — approval-pending stalls misclassified

rc19 demoted the bridge-side WARN (progress_edits.stall_detected) to a
paced INFO (subprocess.approval_pending) when _has_pending_approval()
returned true. The watchdog-side detector in runner.py (which emits
subprocess.liveness_stall and is the actual signal untether-issue-watcher
auto-files on) was untouched, so the daemon kept filing GitHub issues
on routine approval-pending sessions and the nsd audit (2026-05-18)
showed a user cancelling a productive 15-minute investigation because
the chat-side reassurance came too late (1800s threshold).

This commit:
- Adds _recent_event_is_control_request helper in runner.py — uses the
  stream's recent_events ring buffer as the approval-pending signal,
  consistent with the bridge's inline-keyboard predicate but accessible
  to runner-scope code.
- Plumbs the predicate into _watchdog_loop: when the last JSONL event
  is control_request, emit subprocess.approval_pending INFO instead of
  liveness_stall WARN. Skip the auto-kill branch entirely. Pace INFO
  emission to once per 30 min via the shared _APPROVAL_PENDING_REFIRE_S
  constant (now defined once in runner.py and imported by the bridge).
- Splits _STALL_THRESHOLD_APPROVAL into _STALL_THRESHOLD_APPROVAL_FIRST
  (600s) and the existing 1800s refire so the user gets a reassuring
  "tap a button above" chat message at 10 min on first occurrence,
  matching the watchdog's liveness threshold and avoiding the nsd-style
  early cancellation.
- Rewords the chat-side approval reminder copy to make the "tap a
  button above to proceed (no action needed otherwise)" affordance
  explicit, directly quoting the audit's recommended text.

Tests cover both code paths:
- tests/test_outbox_delivery.py (existing) — format helper + settings
  default unchanged; no new file-level tests needed.
- tests/test_exec_bridge.py — failed-run surfacing, notify_skipped=false
  suppression, only-overflow filter, two-tier first-reminder threshold,
  reworded copy.
- tests/test_exec_runner.py — predicate truth-table coverage, watchdog
  demotion via integration with a fake codex script emitting control
  _request, watchdog WARN still fires when no control_request is recent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 27b1b66a-1105-4e72-9db5-875e43519213

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/rc20-issue-followups

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant