Skip to content

Chain triggers time out against busy orchestrator sessions — default 5-min idle wait too short, 10-min MAX_TRIGGER_TIMEOUT cap also marginal #68

Description

@JeanBaptisteRenard

Context

The trigger-watcher's chain triggers ({"chain":[...],"wait":"idle"}) wait for the target session to become idle before injecting each step. Two limits govern this wait:

  • default per-step idle timeout: 300 000 ms (5 min)
  • hard cap: MAX_TRIGGER_TIMEOUT = 600_000 (trigger-watcher.js:38) — any timeout_ms above 10 min is rejected with invalid step timeout_ms

Problem

Busy orchestrator sessions (the primary consumer of chain triggers, e.g. the auto-compact FORCE path: /compact → resume prompt) frequently never reach 5 consecutive minutes of idle: background sub-agent notifications, cron ticks, and tool results keep re-waking them. The chain then fails silently from the session's point of view.

Witnessed (2026-06-05 ~20:00 UTC+2)

  1. Auto-compact chain dropped against an active orchestrator session →
    {"ok":false,"error":"chain timeout","steps_completed":0,"total_waited_ms":300290}
    — correct session targeted, but never idle ≥5 min.
  2. Re-drop with timeout_ms: 1800000 → rejected (invalid step timeout_ms, cap is 600 000).
  3. Re-drop at the 600 000 cap succeeded (steps_completed:2, total_waited_ms:150198) — but only because the session deliberately went quiet (no crons/agents running). Without that cooperation, even 10 min is marginal for a session with periodic background activity.

Suggestions (any subset)

  • Raise the default per-step timeout for chain steps (e.g. 10 min), since chains are typically dropped by the session itself expecting eventual delivery.
  • Raise or make configurable MAX_TRIGGER_TIMEOUT (e.g. 30–60 min) — a chain that fires late is almost always better than one that dies silently.
  • Alternative/complement: an "expire":"never"-style mode where the chain persists until the session is next idle, however long that takes (with the existing processed/result.json audit trail).
  • On chain timeout, consider re-queueing once instead of discarding — the failure mode today is fully silent for the target session (only discoverable by reading processed/*.result.json).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions