Support git-lfs repos in remote (cloud) boxes#114
Merged
Conversation
Extends the docker git-lfs fix (PR #112) to the cloud providers (daytona, hetzner, vercel, e2b). Cloud boxes seed the workspace via a shallow `git clone --no-checkout file://<hostRepo>` + tar, which never populates `.git/lfs/objects`, and have no host credentials / (hetzner) locked egress, so an LFS-tracked repo checked out with broken pointer files (or failed the seed under `set -e`). Read/seed parity with docker; push-back + lazy fetch are intentionally out of scope (need a relay LFS transport). Layer 1 — base images install git-lfs + register the system filter: - hetzner/install-box.sh, e2b/build-template.sh: add `git-lfs` to the package list + `git lfs install --system --skip-repo`. - vercel/provision.sh: install git-lfs as a separate best-effort step (NOT the atomic base dnf transaction — AL2023 may lack it; falls back to the git-lfs packagecloud repo) + system filter. - daytona inherits git-lfs from Dockerfile.box (comment only). - `--system` is required because cloud boxes (unlike docker) have no bind-mounted ~/.gitconfig carrying filter.lfs.process. Layer 2 — host-side working-set seeding (sandbox-cloud/workspace-seed.ts): - seedCloneLfsObjects: probe `git lfs ls-files`, best-effort `git lfs fetch origin <ref>`, then copy ONLY the checkout ref's content-addressed object blobs (.git/lfs/objects/aa/bb/<oid>) into the clone so they ride the existing workspace tar. The in-box checkout then smudges real content with zero box network/creds. Bounded to the working set, fully best-effort (missing oid -> pointer, never fails the seed). Wired into both seedFromGitClone call sites (incl. the adaptive-depth rebuild). - Checkpoint-restore delta path ships only the oids the delta introduces (target \ checkpointTip) as agentbox-delta-lfs.tar.gz, extracted into the box's .git before the reset. - git-identity.ts: clarifying comment (filter is system-wide; no per-box config). Verified locally with an A/B PoC simulating the cloud seed against an LFS fixture with an unreachable origin: without the fix the credential-less checkout's smudge fails (no object); with it the seeded objects let the checkout smudge real content (sha256 matches the oid). Build + lint + unit tests green (added lfsObjectRelPath path-layout test). Per-provider live cloud bakes still pending. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extends the docker git-lfs fix (#112) to the cloud providers — daytona, hetzner, vercel, e2b. Targets
nightlybecause it builds directly on #112 (which is only on nightly).Problem
Cloud boxes seed
/workspacefrom a host-side shallowgit clone --no-checkout file://<hostRepo>→ tar.git/→ upload → in-boxgit checkout. A plain clone never populates.git/lfs/objects, and cloud boxes have no host git credentials (and, on hetzner, locked egress). So an LFS-tracked repo either checked out with broken pointer files or failed the seed outright (the in-box smudge hits the upstream LFS endpoint unauthenticated → error underset -e). Docker dodges this via its bind-mounted, shared.git/lfs; cloud has no bind mount.Scope
Read/seed parity with docker — LFS repos check out with real content at create and checkpoint-restore. Push-back of box-created LFS objects and lazy on-demand
git lfs pullare intentionally out of scope (both need a relay LFS transport; follow-up). For docker, push-back already works as an emergent property of the shared.git/lfs+ host-sidegit push.Layer 1 — base images (git-lfs binary + system filter)
install-box.sh) / e2b (build-template.sh): addgit-lfsto the package list +git lfs install --system --skip-repo.provision.sh): git-lfs as a separate best-effort step (not the atomic base dnf transaction — AL2023 may not carry it; falls back to the git-lfs packagecloud rpm repo) + system filter.Dockerfile.box(Support git-lfs repos in the box #112) — comment only.--systemis required: cloud boxes have no bind-mounted~/.gitconfigcarryingfilter.lfs.process.Layer 2 — host-side working-set seeding (
sandbox-cloud/workspace-seed.ts)seedCloneLfsObjects: probegit lfs ls-files, best-effortgit lfs fetch origin <ref>(host holds the creds), then copy only the checkout ref's content-addressed object blobs (.git/lfs/objects/aa/bb/<oid>) into the clone so they ride the existingworkspace.tar.gz. The in-box checkout then smudges real content with zero box network/creds. Bounded to the working set; best-effort (missing oid → pointer, never fails the seed). Wired into bothseedFromGitClonecall sites (incl. the adaptive-depth rebuild).target \ checkpointTip) asagentbox-delta-lfs.tar.gz, extracted into the box's.gitbefore the reset.Verification — live cloud e2e ✅
Re-baked (
agentbox prepare -f) and created a box on each provider against the ssh LFS fixture (../agentbox-test-repo,sample.binoidfc270de147…), then asserted in-box. All three pass identically —sample.binis real 524288-byte content (not a pointer), sha256 == the host oid,git lfs ls-filesshows it downloaded, system filter registered, git status clean:seeded 1 git-lfs object(s) for HEADfc270de147…✓fc270de147…✓seeded 1 git-lfs object(s) for HEADfc270de147…✓Dockerfile.box+ the same host-sideseedCloneLfsObjects.lfsObjectRelPathpath-layout test).https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs