Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@

### fixes

- **fix:** rc18 — `_post_result_idle_watchdog` post-result hang root cause + AskUserQuestion final-keyboard clear + auto-continue outbox+UX. Three independent rc18 fixes shipped together.
- **#333 — post-result hang fix (Tier 1+2+3 + Task 4a):** rc17 (#549) added entry/exit/tick instrumentation to the watchdog; that instrumentation caught the limbo on channelo session `8876c902` (2026-05-17, 26.6 min wasted). Root cause: when Claude Code v2.1.143 closes stdout while keeping the subprocess alive, the watchdog exited early via `task_exited reason=reader_done`, bypassing the 600 s countdown — and stall-detector suppression cascades (post_result + MCP-heartbeat-driven children-active) hid the limbo from auto-cancel indefinitely. **Tier 1 (`claude.py`):** when `reader_done` fires while `proc.returncode is None`, the new `_post_result_subcountdown` re-arms a stdout-closed countdown, defers on pending control_request / ask_question, then SIGTERMs the process group after `timeout_s`, 5 s grace, SIGKILL if still alive. New `task_exited` reasons: `reader_done_but_alive_timeout`, `subprocess_exited_during_subcountdown`. **Tier 2 (`runner_bridge.py`):** new `_POST_RESULT_LIMBO_THRESHOLD_S = 660.0` class const + `_post_result_idle_age_seconds()` helper; when post-result idle age exceeds the threshold AND no other expected-wait flag is set, the stall detector stops suppressing auto-cancel. One-shot `progress_edits.post_result_limbo_detected` warning. **Tier 3 (`claude.py`):** new `runner.limbo_detected` warning fired 30 s into the subcountdown when the subprocess is still alive — picked up automatically by `untether-issue-watcher` for `auto:error-report` filing on future regressions. **Task 4a (`runner.py` + `claude.py`):** `JsonlStreamState.lifecycle_state` + `_transition_lifecycle()` helper emits `subprocess.state.<name>` info logs at every transition (`reader_eof`, `subcountdown`, `limbo`, `sigterm_sent`, `sigkill_sent`, `exited`). Permanent canary for future hang-class issues. 7 new tests (4 in `tests/test_claude_runner.py`, 3 in `tests/test_exec_bridge.py`) [#333](https://github.com/littlebearapps/untether/issues/333)
- **#550 — AskUserQuestion final-keyboard clear:** after the user answers the last question in a multi-question `AskUserQuestion` flow, the inline keyboard on the question message is now stripped via `ctx.executor.edit` (Approach A from the rc18 handover). Previously the buttons stayed clickable and fired `ask_question.flow_missing` warnings since the flow state was already cleaned up. Failure modes preserved: `answer_ask_question_with_options` returning `False` leaves the buttons in place (so the user can retry); `ctx.executor.edit` raising logs `ask_question.keyboard_clear_failed` but does NOT block the answer-sent return. 4 new tests in `tests/test_ask_user_question.py` [#550](https://github.com/littlebearapps/untether/issues/550)
- **#551 — auto-continue outbox + UX (Tier 0 + Tier 1):** **Tier 0:** outbox files written by subprocess 1 during the stuck-after-tool-results window are now delivered BEFORE subprocess 2 spawns, eliminating the ~3.6% silent loss observed on lba-1. The pre-swap call mirrors the existing `deliver_outbox_files` plumbing at the final-message site (cleanup=True so subprocess 2 starts fresh). Failure to deliver does NOT block auto-continue — the recovery is more important than any single batch of files; new `outbox.delivered_pre_auto_continue` info + `outbox.auto_continue_delivery_failed` warning logs. **Tier 1:** the auto-continue Telegram notice text changed from `⚠️ Auto-continuing — Claude stopped before processing tool results` to `🔁 Auto-resuming session after upstream Claude Code event`. The 🔁 prefix signals recovery rather than failure and discourages users from `/cancel`-ing the salvage. **Task 4b (`runner.py` + `runner_bridge.py`):** `JsonlStreamState.stall_suppression_counts: dict[str, int]` + `_bump_stall_suppression()` helper increments per-suppression-reason counters at three sites (`expected_wait`, `post_result`, `children_active`). `session.summary` now includes a stable `stall_suppressions=expected_wait:N,post_result:N,children_active:N` summary line so log audits can spot suppression cascades without parsing nested JSON. Stretch tiers (#551 Tier 2/3/4 — catalog-staleness suppression window, rate-limit-aware deferral, registry preservation) deferred to a future patch [#551](https://github.com/littlebearapps/untether/issues/551)
- **fix:** rc17 — `_post_result_idle_watchdog` entry/exit/tick instrumentation (#333) + `last_bg_bash_launched_at` scalar (latent #347 sibling defect). Channelo VPS on rc16 (which already shipped the #544 ScheduleWakeup arm-delay scalar) hit a 43+ min post-result hang on session `b5c1c3e0-…` with `pending_wakeup=False` — i.e. NO `ScheduleWakeup` involved, so the #544 fix didn't apply. Logs showed `post_result=True` (so `state.result_received_at` IS set), `[watchdog]` config used the default `post_result_idle_enabled=true`, and the subprocess + children stayed alive (so `reader_done` was NOT set) — yet **zero** `claude.post_result_idle.closing_stdin` / `…deferred` log lines existed despite elapsed ≫ 600 s. Three of the four #333 candidates ruled out via logs + live `py-spy dump`; the remaining "task crashed silently / never started" candidate cannot be discriminated without entry/exit instrumentation. The CHANGELOG line in rc16 deferred #333 to v0.35.4 pending instrumentation — rc17 lands the instrumentation now and overrides that deferral. **Instrumentation:** `_post_result_idle_watchdog` now emits `claude.post_result_idle.task_started` (session_id, timeout_s, poll_interval_s) at entry; `claude.post_result_idle.tick` every iteration (armed, elapsed_s, effective_timeout_s, dead_wakeup, pending_requests, pending_asks, would_close, last_bg_bash_launched_at_age_s, last_schedule_wakeup_arm_delay); `claude.post_result_idle.tick_error` (warning + exc_info) on transient per-tick failures with one-interval backoff; and `claude.post_result_idle.task_exited` (reason ∈ `reader_done` | `stdin_closed` | `cancelled` | `loop_exited`) in a guaranteed `finally`. Per-tick `try/except` (not loop-wide) mirrors `_subprocess_watchdog` / `_drain_catalog_refresh` conventions so a transient error never cancels the sibling `_iter_jsonl_events` task in the task group. Verbose by design — at 30 s poll × hours of session = O(120) lines, trivial; rate-limiting now would create ambiguity in the next reproduction. **`last_bg_bash_launched_at` scalar:** `_clear_background_handle` (claude.py:550) pops `live_bg_bashes` on tool_result mirroring the original #507 ScheduleWakeup defect that #544 fixed via a scalar high-water-mark; new `ClaudeStreamState.last_bg_bash_launched_at: float | None` is set in `_register_background_handle` at the `Bash + run_in_background` branch, NOT cleared in `_clear_background_handle`, and reset on the same fresh-user-prompt path that resets `last_schedule_wakeup_arm_delay`. Critically a LAUNCH tracker, not a LIFETIME tracker — bg-bashes can outlive multiple user turns (long `npm install`, `tail -f`) so per-turn reset is correct. **Observability-only today**; the bridge's existing `_has_fresh_bash_output` / `_has_recent_bash_action` (runner_bridge.py:1738, 1753) remain the higher-fidelity bash-liveness proxies and the new scalar deliberately does NOT replace them in any suppression path. 7 new tests in `tests/test_claude_runner.py` (5 scalar lifecycle + 2 watchdog instrumentation covering `task_started`/`tick`/`task_exited` ordering and the `reader_done` exit path). The actual fix for whatever the new instrumentation reveals lands in a follow-up rc — rc17 is the diagnostic [#333](https://github.com/littlebearapps/untether/issues/333) (cross-ref [#544](https://github.com/littlebearapps/untether/issues/544), [#347](https://github.com/littlebearapps/untether/issues/347), [#374](https://github.com/littlebearapps/untether/issues/374))
- **fix:** rc16 — `ScheduleWakeup` post-result hold-open redux. The rc11 #507 fix added a `state.live_wakeups_arm_delay: dict[str, float]` populated in `_register_background_handle` and read in `_post_result_idle_watchdog` to shorten the 600 s timeout to `max_armed_delay + 60 s` when /loop is OFF. But the dict was wiped by `_clear_background_handle` on the ScheduleWakeup tool_result — which is the schedule-confirmation, not a terminal signal — so by the time the watchdog ticked (after the `result` event, which lands AFTER tool_result) the dict was empty and the dead-wakeup shortcut never engaged. Live impact: channelo VPS auditor-toolkit session `d11739ee-…` on rc15, 24+ min hold-open with `pending_wakeup=False` despite `last_action='tool:ScheduleWakeup (done)'`. Replaced the per-tool_id dict with `ClaudeStreamState.last_schedule_wakeup_arm_delay: float | None` — a per-turn scalar high-water-mark (`max` semantics for multi-wakeup turns) that survives `_clear_background_handle` and resets on each fresh user prompt (`StreamUserMessage` with non-tool_result content; mixed batches preserve the scalar). 4 new tests in `tests/test_claude_runner.py` cover the full tool_use → tool_result → result lifecycle (the #507 unit tests bypassed `_clear_background_handle`, which is why this slipped through), multi-wakeup max selection, new-turn reset, and the mixed-batch edge case. The two existing #507 tests now seed the scalar instead of the dict. The broader background-task-lifecycle refactor (terminal-vs-arm signal per primitive + deadline-expiry sweeps) tracked in [#374](https://github.com/littlebearapps/untether/issues/374) stays in v0.35.4; the sibling defect where the 600 s safety-net watchdog silently doesn't fire stays in [#333](https://github.com/littlebearapps/untether/issues/333) for v0.35.4 pending entry/exit instrumentation [#544](https://github.com/littlebearapps/untether/issues/544)
- **fix:** rc14 — `claude.rate_limit_event` logs no longer drop `retry_after_s` on subscription-cap (reset-window) throttles. The Claude CLI emits two shapes of `rate_limit_event`: a full form carrying `retry_after_ms` (already covered) and a bare/reset-window form that carries `requests_reset` / `tokens_reset` ISO timestamps but no `retry_after_ms`. Untether's translate path only consumed `retry_after_ms`, so reset-window events fell into the "no retry hint" branch — `retry_after_s` stayed `None`, `ClaudeStreamState.rate_limit_total_s` never accumulated, and the chat surfaced the generic "⏳ Rate limited — waiting to retry" with no actionable wait time. The rc13 audit observed this firing across a 5-event burst on the `bip` chat that preceded a subscription-cap exhaustion across 3 chats — every event logged `retry_after_s=None cumulative_s=0.0` despite the upstream payload containing actionable wait info. New `_derive_retry_after_s(info)` helper in `runners/claude.py` picks the EARLIER of `requests_reset` / `tokens_reset` (the rate limit lifts as soon as either budget refills), clamps ≥ 0, tolerates both `Z` and `+00:00` ISO suffixes, and returns `None` for unparseable / missing timestamps. The translate path now falls back to the derived value when `retry_after_ms` is `None` and tracks which path fed the field via a new `retry_after_source=retry_after_ms|reset_ts` log key. The structured `claude.rate_limit_event` is also enriched to include every present `RateLimitInfo` field under `info=...` (`requests_limit`, `requests_remaining`, `requests_reset`, `tokens_limit`, `tokens_remaining`, `tokens_reset`, `retry_after_ms`) so future audits can see what upstream actually sent. The two subscription-error message variants observed in the audit ("out of extra usage", "hit your limit") already map to the same friendly hint via `error_hints.py:52-60`, so no work is needed there. Pre-emptive 75/90% budget warnings are out of scope for this fix — deferred as a discrete feature. 4 new tests in `tests/test_claude_runner.py` (`test_translate_rate_limit_event_derives_retry_after_from_reset_ts`, `test_translate_rate_limit_event_prefers_earlier_reset_when_both_present`, `test_translate_rate_limit_event_retry_after_ms_takes_precedence`, `test_translate_rate_limit_event_handles_unparseable_reset_ts`); all four existing tests still pass [#518](https://github.com/littlebearapps/untether/issues/518)
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name = "untether"
authors = [{name = "Little Bear Apps", email = "hello@littlebearapps.com"}]
maintainers = [{name = "Little Bear Apps", email = "hello@littlebearapps.com"}]
version = "0.35.3rc17"
version = "0.35.3rc18"
keywords = ["telegram", "claude-code", "codex", "opencode", "pi", "gemini-cli", "amp", "ai-agents", "coding-assistant", "remote-control", "cli-bridge"]
description = "Run AI coding agents from your phone. Bridges Claude Code, Codex, OpenCode, Pi, Gemini CLI, and Amp to Telegram with interactive permissions, voice input, cost tracking, and live progress."
readme = {file = "README.md", content-type = "text/markdown"}
Expand Down
7 changes: 4 additions & 3 deletions src/untether/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,9 +346,10 @@ class JsonlStreamState:
lifecycle_state_entered_at: float = 0.0
# #333 Task 4b: per-suppression-reason counter, summarised in
# ``session.summary``. Bumped by the bridge stall detector each
# tick a suppression branch fires (post_result, children_active,
# expected_wait). Plain dict (not defaultdict) so the slots-dataclass
# encoding stays trivial; bump via ``counts.get(k, 0) + 1``.
# tick a suppression branch fires (``expected_wait``,
# ``post_result``, ``children_active``). Plain dict so the
# slots-dataclass encoding stays trivial; bump via
# ``counts[k] = counts.get(k, 0) + 1`` from the call site.
stall_suppression_counts: dict[str, int] = field(default_factory=dict)


Expand Down
Loading
Loading