-
Notifications
You must be signed in to change notification settings - Fork 281
Open
Labels
Description
Agent Diagnostic
Investigated the failed job 67691440936 from workflow run #87 (Release Dev) on commit 4878b9b (main branch).
- Failure:
e2e / E2Ejob, step "Run E2E tests", attempt 1. - Test:
e2e/python/test_sandbox_api.py::test_sandbox_api_crud_and_exec - Error:
grpc._channel._InactiveRpcErrorwithStatusCode.DEADLINE_EXCEEDEDduringCreateSandboxRPC call. - Result: 1 failed, 67 passed out of 68 tests. The run was automatically retried (attempt 2).
- Prior runs: The three preceding Release Dev workflow runs all had E2E jobs pass successfully, confirming this is an intermittent failure, not a regression introduced by the commit.
- Root cause hypothesis: The
CreateSandboxgRPC call times out before the sandbox is provisioned. Possible causes include resource contention on thebuild-arm64runner, slow container startup after cluster bootstrap, or an insufficient default timeout in the Python SDK'screatemethod.
Description
Actual behavior: test_sandbox_api_crud_and_exec fails intermittently with a DEADLINE_EXCEEDED gRPC error when calling sandbox(delete_on_exit=True), which invokes client.create_session(spec=self._spec) → self._stub.CreateSandbox(...).
Expected behavior: The test should reliably create a sandbox within the configured timeout, or have retry/backoff logic to handle transient delays in sandbox provisioning.
Reproduction Steps
- Observe the failed job logs from Release Dev run refactor: switch sandbox-internal RPCs from sandbox_id to name-based lookup #87, attempt 1.
- The failure occurs at
e2e/python/test_sandbox_api.py:29inside thewith sandbox(delete_on_exit=True)context manager.
Environment
- Runner:
build-arm64(ARM64 self-hosted) - Workflow:
release-dev.yml(Release Dev refactor: switch sandbox-internal RPCs from sandbox_id to name-based lookup #87) - Commit:
4878b9b086d6af9c24e93de14ff644652086f30c
Logs
e2e/python/test_sandbox_api.py:29: in test_sandbox_api_crud_and_exec
with sandbox(delete_on_exit=True) as sb:
python/openshell/sandbox.py:471: in __enter__
self._session = client.create_session(spec=self._spec)
python/openshell/sandbox.py:206: in create_session
return SandboxSession(self, self.create(spec=spec))
python/openshell/sandbox.py:193: in create
response = self._stub.CreateSandbox(
...
E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.DEADLINE_EXCEEDED
E details = "Deadline Exceeded"
E debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4}"
E >
FAILED e2e/python/test_sandbox_api.py::test_sandbox_api_crud_and_exec
=================== 1 failed, 67 passed in 67.49s (0:01:07) ====================
Possible Mitigations
- Increase the gRPC deadline for
CreateSandboxin the E2E test fixtures or the Python SDK client. - Add a warm-up/readiness check after cluster bootstrap before running tests.
- Add retry logic to the
sandboxfixture for transient gRPC errors. - Investigate if ARM64 runner resource contention contributes to the timeout.
Reactions are currently unavailable