Skip to content

ENH: drain-timeout heuristic for self-restart from active session (follow-up to #547 axis 3) #559

@nathanschram

Description

Follow-up to #547 axis 3, deferred from v0.35.3 (rc19 PR #555).

Context

#547 documented an agent self-restart pattern: agents edit untether.toml, then run systemctl --user restart untether from inside their own session — unaware Untether already hot-reloaded the change. The drain has nothing to drain (the agent IS the only active session), times out at 120s, force-exits with outbox.fail_pending count=1, and silently drops the agent's final answer to the user.

rc19 broke the pattern at its source via:

These break the pattern for any agent that reads the preamble or sees the confirmation. Defensive axis 3 — drain-timeout shortening — is still missing.

Scope (v0.35.4)

In src/untether/telegram/loop.py (_drain_and_exit shutdown path), detect:

"the only active_runs entry is a session that was started from a chat that is currently editing files in or near the Untether config directory" — OR more simply, when shutdown.drain.progress would otherwise wait the full 120s on a single session

Downgrade the drain timeout to 5-10s in that case. This doesn't fix the unnecessary restart (axes 1+2 do that) — it stops the worst symptom (force-exit + lost outbox message) when the pattern recurs (preamble disabled, agent override, etc.).

Related files

  • src/untether/telegram/loop.py_drain_and_exit / shutdown.drain.progress loop
  • src/untether/telegram/commands/restart.py — already implements graceful restart via /restart; check if it has a comparable short-circuit for self-initiated cases
  • src/untether/shutdown.py — shutdown state and drain logic

Acceptance

  • New test in tests/test_telegram_loop.py or tests/test_shutdown.py demonstrating the drain timeout is shortened when active_runs == 1 and the active session's chat is the one that triggered the shutdown
  • Manual integration test on @untether_dev_bot: prompt Claude to "edit untether.toml AND THEN restart" — observe drain completes in 5-10s, outbox message NOT dropped

Original context

See #547 for the full incident report (2026-05-16 15:25-15:30 AEST, tax-chat Claude session e339ff8e).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requestseverity:minorSmall UX gap, edge case, cosmetic issue; doesn't block any workflow

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions