-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
Problem
When a multi-agent worker runs relaunch.sh (or the app is rebuilt/relaunched for any reason), the orchestration loop loses track of the worker. The worker session gets a new SDK session after relaunch, but the orchestrator's SendPromptAndWaitAsync was awaiting the old TCS which gets canceled/orphaned.
Observed behavior
In the PP- IC Things orchestration:
- Worker-1 was dispatched and started doing work
- Worker called
relaunch.shto rebuild PolyPilot with code changes - After relaunch, the worker's session was restored but the orchestrator never received the completion signal
- The orchestrator's reflection loop hung waiting for a worker result that would never come
Expected behavior
When the app relaunches mid-orchestration:
- The pending orchestration should be detected and resumed via
PendingOrchestration - Worker results from before the relaunch should be recoverable
- OR the orchestrator should detect the relaunch and re-dispatch the worker
Notes
This is related to but distinct from the server idle timeout issue (#396). In this case the app itself is restarted, not just the server killing an idle session.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels