What happened
A container was killed by the sleepAfter alarm while containerFetch() was actively awaiting a response from the container. The operation (a matplotlib chart generation via Sandbox SDK's exec(), which uses containerFetch internally) had been running for approximately 6 seconds when the container received SIGTERM.
Client-side error:
Error: ReadableStream received over RPC disconnected prematurely.
Container-side error:
SessionDestroyedError: Session 'main' was destroyed during command execution
Note: the exact error messages come from Sandbox SDK, but sharing it here to be illustrative.
Reproduction
- Create a DO subclass extending
Container
- Set
sleepAfter to a short duration (e.g., "10s")
- Call
containerFetch() with a request that takes longer than sleepAfter to complete (e.g., a command execution that runs for 15+ seconds)
- The alarm fires while
containerFetch() is still awaiting the response, and onActivityExpired() kills the container
This also affects back-to-back operations: if several quick operations consume most of the sleepAfter window, a subsequent longer operation can be killed even though it just started, because renewActivityTimeout() resets the deadline to now + sleepAfter, not now + sleepAfter from when the operation finishes.
Expected behaviour
The sleepAfter timer should not fire while containerFetch() is actively awaiting a response. The container is clearly in use when there's an in-flight request.
Current workaround
Before each long-running operation, extend the sleepAfter property and refresh the timeout:
this.sleepAfter = estimatedDurationSeconds + 30;
this.renewActivityTimeout();
This works but requires subclasses to predict operation duration and manage the timeout around every containerFetch() call.
Potentially related issues
Environment
Observed in production using v0.1.1 of @cloudflare/containers
What happened
A container was killed by the
sleepAfteralarm whilecontainerFetch()was actively awaiting a response from the container. The operation (amatplotlibchart generation via Sandbox SDK'sexec(), which usescontainerFetchinternally) had been running for approximately 6 seconds when the container received SIGTERM.Client-side error:
Container-side error:
Note: the exact error messages come from Sandbox SDK, but sharing it here to be illustrative.
Reproduction
ContainersleepAfterto a short duration (e.g.,"10s")containerFetch()with a request that takes longer thansleepAfterto complete (e.g., a command execution that runs for 15+ seconds)containerFetch()is still awaiting the response, andonActivityExpired()kills the containerThis also affects back-to-back operations: if several quick operations consume most of the
sleepAfterwindow, a subsequent longer operation can be killed even though it just started, becauserenewActivityTimeout()resets the deadline tonow + sleepAfter, notnow + sleepAfterfrom when the operation finishes.Expected behaviour
The
sleepAftertimer should not fire whilecontainerFetch()is actively awaiting a response. The container is clearly in use when there's an in-flight request.Current workaround
Before each long-running operation, extend the
sleepAfterproperty and refresh the timeout:This works but requires subclasses to predict operation duration and manage the timeout around every
containerFetch()call.Potentially related issues
containerFetchpaths)sleepAfterappears ineffectiveEnvironment
Observed in production using v0.1.1 of
@cloudflare/containers