Skip to content

Support git-lfs repos in remote (cloud) boxes#114

Merged
madarco merged 1 commit into
nightlyfrom
feat/cloud-git-lfs
Jun 25, 2026
Merged

Support git-lfs repos in remote (cloud) boxes#114
madarco merged 1 commit into
nightlyfrom
feat/cloud-git-lfs

Conversation

@madarco

@madarco madarco commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Extends the docker git-lfs fix (#112) to the cloud providers — daytona, hetzner, vercel, e2b. Targets nightly because it builds directly on #112 (which is only on nightly).

Problem

Cloud boxes seed /workspace from a host-side shallow git clone --no-checkout file://<hostRepo> → tar .git/ → upload → in-box git checkout. A plain clone never populates .git/lfs/objects, and cloud boxes have no host git credentials (and, on hetzner, locked egress). So an LFS-tracked repo either checked out with broken pointer files or failed the seed outright (the in-box smudge hits the upstream LFS endpoint unauthenticated → error under set -e). Docker dodges this via its bind-mounted, shared .git/lfs; cloud has no bind mount.

Scope

Read/seed parity with docker — LFS repos check out with real content at create and checkpoint-restore. Push-back of box-created LFS objects and lazy on-demand git lfs pull are intentionally out of scope (both need a relay LFS transport; follow-up). For docker, push-back already works as an emergent property of the shared .git/lfs + host-side git push.

Layer 1 — base images (git-lfs binary + system filter)

  • hetzner (install-box.sh) / e2b (build-template.sh): add git-lfs to the package list + git lfs install --system --skip-repo.
  • vercel (provision.sh): git-lfs as a separate best-effort step (not the atomic base dnf transaction — AL2023 may not carry it; falls back to the git-lfs packagecloud rpm repo) + system filter.
  • daytona: inherits git-lfs from Dockerfile.box (Support git-lfs repos in the box #112) — comment only.
  • --system is required: cloud boxes have no bind-mounted ~/.gitconfig carrying filter.lfs.process.

Layer 2 — host-side working-set seeding (sandbox-cloud/workspace-seed.ts)

  • seedCloneLfsObjects: probe git lfs ls-files, best-effort git lfs fetch origin <ref> (host holds the creds), then copy only the checkout ref's content-addressed object blobs (.git/lfs/objects/aa/bb/<oid>) into the clone so they ride the existing workspace.tar.gz. The in-box checkout then smudges real content with zero box network/creds. Bounded to the working set; best-effort (missing oid → pointer, never fails the seed). Wired into both seedFromGitClone call sites (incl. the adaptive-depth rebuild).
  • Checkpoint-restore delta path: ships only the oids the delta introduces (target \ checkpointTip) as agentbox-delta-lfs.tar.gz, extracted into the box's .git before the reset.

Verification — live cloud e2e ✅

Re-baked (agentbox prepare -f) and created a box on each provider against the ssh LFS fixture (../agentbox-test-repo, sample.bin oid fc270de147…), then asserted in-box. All three pass identicallysample.bin is real 524288-byte content (not a pointer), sha256 == the host oid, git lfs ls-files shows it downloaded, system filter registered, git status clean:

Provider git-lfs in image seed log in-box sha256 result
e2b 3.3.0 (Debian apt) seeded 1 git-lfs object(s) for HEAD fc270de147… real content
hetzner 3.4.1 (Ubuntu apt) seeded ✓ fc270de147… real content
vercel dnf (AL2023 — no packagecloud fallback needed) seeded 1 git-lfs object(s) for HEAD fc270de147… real content
  • daytona not run (no creds in this env), but it shares the exact same code path — git-lfs inherited from Dockerfile.box + the same host-side seedCloneLfsObjects.
  • Also a local A/B PoC (unreachable origin) confirming the credential-less checkout fails without the fix and smudges real content with it.
  • Build + lint + 77 unit tests green (added a lfsObjectRelPath path-layout test).

https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs

Extends the docker git-lfs fix (PR #112) to the cloud providers (daytona,
hetzner, vercel, e2b). Cloud boxes seed the workspace via a shallow
`git clone --no-checkout file://<hostRepo>` + tar, which never populates
`.git/lfs/objects`, and have no host credentials / (hetzner) locked egress,
so an LFS-tracked repo checked out with broken pointer files (or failed the
seed under `set -e`). Read/seed parity with docker; push-back + lazy fetch
are intentionally out of scope (need a relay LFS transport).

Layer 1 — base images install git-lfs + register the system filter:
- hetzner/install-box.sh, e2b/build-template.sh: add `git-lfs` to the package
  list + `git lfs install --system --skip-repo`.
- vercel/provision.sh: install git-lfs as a separate best-effort step (NOT the
  atomic base dnf transaction — AL2023 may lack it; falls back to the git-lfs
  packagecloud repo) + system filter.
- daytona inherits git-lfs from Dockerfile.box (comment only).
- `--system` is required because cloud boxes (unlike docker) have no
  bind-mounted ~/.gitconfig carrying filter.lfs.process.

Layer 2 — host-side working-set seeding (sandbox-cloud/workspace-seed.ts):
- seedCloneLfsObjects: probe `git lfs ls-files`, best-effort
  `git lfs fetch origin <ref>`, then copy ONLY the checkout ref's
  content-addressed object blobs (.git/lfs/objects/aa/bb/<oid>) into the clone
  so they ride the existing workspace tar. The in-box checkout then smudges
  real content with zero box network/creds. Bounded to the working set,
  fully best-effort (missing oid -> pointer, never fails the seed). Wired into
  both seedFromGitClone call sites (incl. the adaptive-depth rebuild).
- Checkpoint-restore delta path ships only the oids the delta introduces
  (target \ checkpointTip) as agentbox-delta-lfs.tar.gz, extracted into the
  box's .git before the reset.
- git-identity.ts: clarifying comment (filter is system-wide; no per-box config).

Verified locally with an A/B PoC simulating the cloud seed against an LFS
fixture with an unreachable origin: without the fix the credential-less
checkout's smudge fails (no object); with it the seeded objects let the
checkout smudge real content (sha256 matches the oid). Build + lint + unit
tests green (added lfsObjectRelPath path-layout test). Per-provider live
cloud bakes still pending.

Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
@vercel

vercel Bot commented Jun 25, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentbox-web Ready Ready Preview, Comment Jun 25, 2026 12:34pm

Request Review

@madarco madarco merged commit bb2f954 into nightly Jun 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant