Skip to content

fix(codex): make agent home dirs vscode-owned so Codex can create state_*.sqlite#117

Merged
madarco merged 2 commits into
nightlyfrom
fix/codex-sqlite-dir-ownership
Jun 26, 2026
Merged

fix(codex): make agent home dirs vscode-owned so Codex can create state_*.sqlite#117
madarco merged 2 commits into
nightlyfrom
fix/codex-sqlite-dir-ownership

Conversation

@madarco

@madarco madarco commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Problem

We stopped seeding Codex's state_*.sqlite index (commit 2eaf2b428, "fix(codex): don't seed Codex's session-state DBs into boxes"). Codex now creates that index at startup instead of receiving it pre-uploaded — and the create failed with a permission error because the target directory wasn't owned by vscode (the user the agent runs as). A box agent diagnosed it live: "/home/vscode/.codex was owned by node:node, and the uploaded session directory was root:root, so the vscode user couldn't create state_5.sqlite."

Two distinct ownership defects, each blocking a different path:

  1. Agent home dirs not reliably vscode-owned in cloud templates. Breaks even a plain agentbox codex start: Codex writes ~/.codex/state_*.sqlite at the top level and can't if ~/.codex itself is node:node (E2B's base image ships a node user; the root npm install -g @openai/codex bake step left it that way).
  2. Upload primitives only chowned the final landed path, not the parent chain they mkdir -p'd as root. Session-teleport lands a rollout at ~/.codex/sessions/YYYY/MM/DD/, leaving that chain root:root so Codex can't write a new rollout / its sqlite index. This is the exact bug already fixed for carry: in carry.ts:144-156 — never ported to the upload path.

Fix

  • agent-credentials.ts — new ensureAgentHomeDirsOwned(): a cheap, idempotent create-time chown -R vscode:vscode over ~/.codex, ~/.claude, ~/.local/share/opencode. Fixes existing prepared templates without a re-bake (preferred over a Dockerfile change). chown -R doesn't deref symlinks, so the baked credential symlinks are untouched.
  • cloud-provider.ts — calls it unconditionally after seedAgentVolumesIfFresh, before teleport/agent launch.
  • cloud-cp.ts (uploadToCloudBox) — after the final-path chown, walks the parent chain up to /home/vscode (exclusive), gated on dest under home. The bash -c wrapping protects $(...)/while from Vercel's outer sudo -u vscode -H bash -lc exec nesting (no per-backend carve-out needed).
  • box-cp.ts (docker uploadToBox) — equivalent parent-walk via docker exec --user root (fix across all providers).
  • cloud-cp.test.ts — asserts the parent-walk is present under /home/vscode/ and absent for /etc/* and /workspace/*.

Both parts are needed: Part 1 runs at create (before teleport); Part 2 fixes the post-create teleport upload. Plain codex starts never hit the upload path, so they rely on Part 1.

Chowns are name/id-derived (vscode / id -un), not hardcoded 1000, because the vscode uid varies per provider (see results).

Verification

pnpm build, pnpm lint, full pnpm test (916+ tests across 25 packages) all green, including the new cloud-cp.test.ts.

Live, end-to-end on every available provider (created a box from a test repo, then checked ownership + write-probes; agentbox cp exercises the same upload primitive session-teleport uses):

Provider vscode uid Part 1: ~/.codex writable (created state_probe.sqlite) Part 2: teleport sessions/ ancestors vscode-owned + sibling writable
docker 1000
e2b 1002
hetzner 1000
vercel 1001
  • e2b is the originally-reported provider — now green on both parts.
  • vercel confirms the bash -c wrapping survives Vercel's exec nesting.
  • Daytona wasn't exercised (no API key / prepared snapshot on the host); the same code paths apply, with the FUSE-volume || true tolerance already in place.

Notable finding: the in-box vscode uid differs per provider (docker/hetzner=1000, vercel=1001, e2b=1002) because each base image reserves 1000 differently — exactly why the fix chowns by name, not a hardcoded 1000.

All four verification boxes destroyed; no orphan sandboxes/servers left behind.

https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2


Note

Medium Risk
Changes ownership normalization on every cloud create and on host→box uploads for paths under the agent home; failures are tolerated but mis-chown could still leave edge cases on FUSE/read-only mounts.

Overview
Fixes EACCES when Codex creates state_*.sqlite at startup (no longer pre-seeded): agent homes and upload-created paths must be writable by vscode.

Create-time: Adds ensureAgentHomeDirsOwned — best-effort chown -R vscode:vscode on ~/.codex, ~/.claude, and OpenCode’s data dir — and runs it on every cloud box create after credential seeding so wrong template owners (e.g. node:node on E2B) don’t block Codex.

Upload-time: uploadToCloudBox and docker uploadToBox now chown the landed path and walk parent directories up to /home/vscode when the destination is under home, so root-owned mkdir -p chains from session teleport / agentbox cp don’t block sibling writes (e.g. sqlite under ~/.codex/sessions/...). Paths under /etc or /workspace skip the parent walk.

Exports the new helper from sandbox-cloud; cloud-cp.test.ts asserts the parent-walk script for home vs non-home destinations.

Reviewed by Cursor Bugbot for commit 090f9f6. Configure here.

…te_*.sqlite

We stopped seeding Codex's state_*.sqlite index (commit 2eaf2b4), so Codex
now creates it at startup instead of receiving it pre-uploaded. The create
failed with a permission error because the directory wasn't owned by vscode
(the user the agent runs as). Two distinct ownership defects:

1. The agent home dirs (~/.codex, ~/.claude, ~/.local/share/opencode) were not
   reliably vscode-owned in cloud templates (E2B's base image ships a `node`
   user; the root `npm install -g @openai/codex` bake step left ~/.codex as
   node:node). This breaks even a plain `agentbox codex` start. Fixed with a
   cheap, idempotent create-time chown (ensureAgentHomeDirsOwned) — no re-bake.

2. The upload primitives only chowned the final landed path, not the parent
   directory chain they mkdir -p'd as root. Session-teleport lands a rollout at
   ~/.codex/sessions/YYYY/MM/DD/, leaving that chain root-owned so Codex can't
   write a new rollout / its sqlite index. Mirror the carry.ts parent-walk fix
   in both upload primitives (cloud-cp.ts + docker box-cp.ts), gated on the dest
   being under /home/vscode.

Chowns are name/id-derived (vscode / id -un), not hardcoded 1000, since the
vscode uid varies per provider (docker/hetzner=1000, vercel=1001, e2b=1002).

Claude-Session: https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2
@vercel

vercel Bot commented Jun 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
agentbox-web Skipped Skipped Jun 26, 2026 11:19am

Request Review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 090f9f6. Configure here.

`while [ "$parent" != ${quoteShellArg(BOX_HOME)} ] && [ "$parent" != "/" ]; do ` +
`$SUDO chown "$(id -un):$(id -gn)" "$parent" || true; ` +
`parent=$(dirname "$parent"); ` +
`done`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parent walk chowns /home

Medium Severity

When an upload’s resolved finalPath is exactly /home/vscode, the new parent-chain chown treats that as under home, sets parent to /home, and the loop condition only excludes /home/vscode, so /home itself can be reassigned to the agent user. The existing carry path avoids this by requiring destinations under BOX_HOME/ with a trailing segment, not equality with BOX_HOME.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 090f9f6. Configure here.

Bugbot: when an upload's resolved finalPath was exactly /home/vscode, the
`=== BOX_HOME` branch of the gate let the parent walk run with dirname=/home,
reassigning /home itself to the agent user. Gate strictly on
`startsWith(BOX_HOME + '/')` (a trailing segment), matching carry.ts. Applies
to both cloud-cp.ts and docker box-cp.ts; adds a regression test.

Claude-Session: https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2
@madarco madarco merged commit ff13b7a into nightly Jun 26, 2026
3 checks passed
@madarco madarco deleted the fix/codex-sqlite-dir-ownership branch June 26, 2026 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant