Skip to content

RFC: skip unchanged checkpoints via in-guest eBPF inspector #2580

@void-main

Description

@void-main

Summary

Most agent-driven sandbox turns produce no recovery-relevant state. A recent paper, Crab — A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes (arXiv 2604.28138, Wu et al., HKUST, Apr 2026), reports that >75% of turns can be classified as "no checkpoint needed" if the runtime watches OS-visible effects, and that this single decision recovers most of the cost of full per-turn checkpointing.

This RFC proposes adapting that idea to e2b: an in-guest eBPF "Inspector" in envd that lets the orchestrator answer "has anything recovery-relevant changed since the last snapshot?" in microseconds, so that POST /sandboxes/{id}/snapshots can short-circuit and avoid a Firecracker pause when the answer is no.

Background: why the paper's design doesn't drop in

Crab's prototype is built on runc + CRIU + ZFS. Its Inspector attaches eBPF to host kernel raw syscall tracepoints. e2b runs sandboxes as Firecracker microVMs, and the paper itself cites e2b/Firecracker only as an expensive baseline in Table 1 (E2B / VM / Full VM / High). The microVM model is not a supported target.

Concretely:

  • Guest syscalls trap to the guest kernel; KVM only surfaces VM-exits to the host. Host-side kprobes / raw_tracepoints / BPF-LSM cannot see guest VFS, file paths, or PIDs. (Confirmed via Firecracker FAQ, KVM API docs, Kata #7556, Ant Group's "Kata + eBPF" CWPP whitepaper which moves eBPF inside the guest for the same reason.)
  • e2b's Firecracker /memory and /dirty-memory endpoints (packages/orchestrator/pkg/sandbox/fc/memory.go) require a pause first, so they can't drive a pre-pause skip decision.
  • Block-level dirty signals available host-side (NBD overlay, qcow2 bitmap, dm-thin) are extent-grained and have no file or process semantics — they can confirm "nothing at all changed" but can't power the more granular partial-checkpoint dimension.

So a faithful port is impossible. The proposal below keeps the core "skip vs full" idea and explicitly drops everything else.

Goal

When the SDK calls POST /sandboxes/{id}/snapshots with a new opt-in flag, return the previous snapshot's ID immediately if nothing has changed in the sandbox since that snapshot — paying zero pause cost.

Non-goals

  • No partial (memory-only / rootfs-only) snapshots. The current Snapshot struct (packages/orchestrator/pkg/sandbox/snapshot.go) is monolithic; splitting it is a separate, larger lift.
  • No HTTP MITM proxy on the agent↔LLM path. e2b's primary deployment is agent-with-a-sandbox — the agent runs on customer infra and never traverses our network.
  • No fast-forward / cached request-response replay. That belongs in the agent / SDK layer.
  • No host-scoped checkpoint scheduler or versioned (P_j, F_k) manifests à la Crab §5.3. Defer.

API surface (no new endpoint)

Reuse the existing POST /sandboxes/{sandboxID}/snapshots (spec/openapi.yml:2483).

Request body adds one optional field:

{
  "name": "...",
  "skipIfUnchanged": true   // default false, fully backward-compatible
}

Response (SnapshotInfo) gains:

{
  "templateID": "...",
  "buildID":    "...",
  "unchanged":  true        // when true, templateID/buildID point at the previous snapshot
}

Architecture

Three slices, each gated behind a LaunchDarkly flag (inspector_skip_unchanged) and an envd version bump.

1. Inspector (in-guest, in envd)

  • Filesystem net-change tracking. eBPF attached to guest raw syscall tracepoints (sys_enter_openat w/ O_CREAT|O_TRUNC|O_WRONLY, unlinkat, renameat2, write/pwrite/writev, truncate, mkdirat, linkat, symlinkat). Per-inode "last operation" entries in a BPF hash map; userspace daemon computes the net change set per Crab §5.2 (a write-then-unlink within the same epoch ⇒ no change).
  • Process / memory net-change tracking. Use the /sys/fs/cgroup/user cgroup envd already creates for spawned processes (packages/envd/internal/services/process/handler/handler.go:123) to enumerate user processes. For each survivor, read soft-dirty bits from /proc/PID/pagemap; reset via /proc/PID/clear_refs after a checkpoint completes. Track cgroup-level births / deaths to catch new long-lived processes.
  • Exclude envd itself. envd is the in-guest equivalent of Crab's "agent process" (§6 agent-in-a-sandbox); skip its PID + handler children, and its scratch paths.
  • Epoch reset is checkpoint-driven, not turn-driven. envd doesn't know about turns. The orchestrator tells envd "I just published checkpoint X, reset" via ResetEpoch.
  • Graceful degradation. If the guest kernel lacks BTF, kprobes, or soft-dirty (we'll confirm against the actual e2b kernel config), Inspector enters a degraded mode where QueryChanges always returns true — falling through to a full checkpoint. Correctness is preserved at every step.

New Connect-RPC service in packages/envd/spec/inspector/inspector.proto, mounted using the existing pattern at packages/envd/main.go:176-187:

service Inspector {
  rpc QueryChanges(QueryChangesRequest) returns (QueryChangesResponse);
  rpc ResetEpoch(ResetEpochRequest)     returns (ResetEpochResponse);
}

message QueryChangesResponse {
  bool   filesystem_changed = 1;
  bool   processes_changed  = 2;
  uint32 epoch_id           = 3;
}

For the MVP we only consume filesystem_changed || processes_changed. Granular booleans are left on the wire so a future "partial checkpoint" phase can use them without a wire break.

2. Orchestrator short-circuit

  • Add bool skip_if_unchanged to SandboxCheckpointRequest and bool unchanged to SandboxCheckpointResponse (packages/orchestrator/orchestrator.proto:163).
  • In Server.Checkpoint (packages/orchestrator/pkg/server/sandboxes.go:552), branch before MarkStopping and snapshotAndCacheSandbox:
    • If flag set + feature flag on + envd version supports Inspector, call Inspector.QueryChanges.
    • Both booleans false ⇒ return Unchanged: true, BuildId: <last published snapshot's build id> immediately.
    • Otherwise, fall through to the current path; on success, call Inspector.ResetEpoch.
  • Capability gate: add utils.CheckEnvdVersionForInspector next to the existing CheckEnvdVersionForSnapshot (packages/orchestrator/pkg/server/sandboxes.go:571).
  • Track "latest published snapshot per sandbox" in the in-memory sandboxFactory.Sandboxes map. Best-effort, no Redis dependency for v1; orchestrator restart simply forces the next call to a full checkpoint.

3. API plumbing

  • Update the request/response schemas in spec/openapi.yml:2483-2517 and re-run make generate/api.
  • packages/api/internal/handlers/snapshot_template_create.go forwards the flag into SnapshotTemplateOpts; propagates unchanged back.
  • packages/api/internal/orchestrator/snapshot_template.go:88 passes the flag on the gRPC call. When unchanged=true, skip the template-upsert work — the previous snapshot is already a real template — and return its ID/build directly.

Rollout

  1. Phase 1. Inspector + skip path behind inspector_skip_unchanged, off by default. Internal benchmarks on a read_file-heavy workload to measure the actual skip rate on e2b (Crab reports 87% on Claude-code/Terminal-Bench; ours will differ).
  2. Phase 2. Default-on for opted-in templates after we observe meaningful skip rates with no recovery regressions.
  3. Phase 3 (separate RFC). Partial checkpoints, versioned manifest, host-scoped scheduler. Big surgery in storage + upload paths; not scoped here.

Risks and open questions

  1. Guest kernel capabilities. Need to verify that e2b's microVM kernel ships with CONFIG_DEBUG_INFO_BTF, CONFIG_KPROBES, CONFIG_MEM_SOFT_DIRTY, and CONFIG_BPF_SYSCALL. If not, either enable in the next kernel rebuild or fall back to inotify+pidfd (less precise; loses net-change semantics).
  2. False negatives. A bug in the eBPF program could miss a real change and cause incorrect recovery. Mitigation: the Crab paper reports zero false negatives in 2,063 manually labelled turns, but we should add an integration test that randomly invalidates the Inspector's response and asserts checkpoints still happen on timeouts.
  3. False positives. Conservative tracking (e.g. write-then-delete in the same epoch counted as changed) is acceptable: it only causes an extra full checkpoint, which is the current behavior anyway.
  4. envd version skew. Old templates that haven't been rebuilt with the new envd will silently take the existing always-pause path. New SDK clients passing skipIfUnchanged=true against old templates get a normal full checkpoint and unchanged=false. No breakage.
  5. Storage / latest-snapshot pointer. In-memory only for v1 — an orchestrator restart between two skip calls forces the next one to full. If customers complain, promote to Redis.
  6. Sandbox-internal network namespaces / containers. If a user runs Docker-in-sandbox or unshares mount/pid namespaces, eBPF in envd's namespace may miss writes inside nested namespaces. Document as a limitation; users with that pattern keep getting full snapshots.

What this is not claiming

  • We are not claiming feature parity with Crab. The paper's headline number (87% skip, 1.9% overhead) was measured on runc+CRIU+ZFS at 96-sandbox density. Our skip rate and savings will be different because Firecracker pause cost, kernel visibility, and per-snapshot overhead are all different.
  • We are not changing the semantics of any existing endpoint. The new flag is opt-in; default behavior is unchanged.

Pointers / prior art

Asks

  • Maintainer feedback on the API shape (keep it on the existing /snapshots endpoint vs. add a dedicated /checkpoint-fast).
  • Confirmation of guest kernel build options (BTF, soft-dirty, kprobes).
  • Greenlight on the envd version-bump pattern + LaunchDarkly flag name.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions