RFC: skip unchanged checkpoints via in-guest eBPF inspector

### Summary

Most agent-driven sandbox turns produce no recovery-relevant state. A recent paper, **Crab — A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes** (arXiv [2604.28138](https://arxiv.org/abs/2604.28138), Wu et al., HKUST, Apr 2026), reports that >75% of turns can be classified as "no checkpoint needed" if the runtime watches OS-visible effects, and that this single decision recovers most of the cost of full per-turn checkpointing.

This RFC proposes adapting that idea to e2b: an **in-guest eBPF "Inspector"** in `envd` that lets the orchestrator answer "has anything recovery-relevant changed since the last snapshot?" in microseconds, so that `POST /sandboxes/{id}/snapshots` can short-circuit and avoid a Firecracker pause when the answer is no.

### Background: why the paper's design doesn't drop in

Crab's prototype is built on **runc + CRIU + ZFS**. Its Inspector attaches eBPF to **host kernel** raw syscall tracepoints. e2b runs sandboxes as **Firecracker microVMs**, and the paper itself cites e2b/Firecracker only as an expensive *baseline* in Table 1 (`E2B / VM / Full VM / High`). The microVM model is not a supported target.

Concretely:

- Guest syscalls trap to the *guest* kernel; KVM only surfaces VM-exits to the host. Host-side kprobes / raw_tracepoints / BPF-LSM cannot see guest VFS, file paths, or PIDs. (Confirmed via Firecracker FAQ, KVM API docs, Kata #7556, Ant Group's "Kata + eBPF" CWPP whitepaper which moves eBPF *inside* the guest for the same reason.)
- e2b's Firecracker `/memory` and `/dirty-memory` endpoints (`packages/orchestrator/pkg/sandbox/fc/memory.go`) require a pause first, so they can't drive a *pre-pause* skip decision.
- Block-level dirty signals available host-side (NBD overlay, qcow2 bitmap, dm-thin) are extent-grained and have no file or process semantics — they can confirm "nothing at all changed" but can't power the more granular partial-checkpoint dimension.

So a faithful port is impossible. The proposal below keeps the core "skip vs full" idea and explicitly drops everything else.

### Goal

When the SDK calls `POST /sandboxes/{id}/snapshots` with a new opt-in flag, return the previous snapshot's ID immediately if nothing has changed in the sandbox since that snapshot — paying zero pause cost.

### Non-goals

- No partial (memory-only / rootfs-only) snapshots. The current `Snapshot` struct (`packages/orchestrator/pkg/sandbox/snapshot.go`) is monolithic; splitting it is a separate, larger lift.
- No HTTP MITM proxy on the agent↔LLM path. e2b's primary deployment is *agent-with-a-sandbox* — the agent runs on customer infra and never traverses our network.
- No fast-forward / cached request-response replay. That belongs in the agent / SDK layer.
- No host-scoped checkpoint scheduler or versioned `(P_j, F_k)` manifests à la Crab §5.3. Defer.

### API surface (no new endpoint)

Reuse the existing `POST /sandboxes/{sandboxID}/snapshots` (`spec/openapi.yml:2483`).

Request body adds one optional field:

```jsonc
{
  "name": "...",
  "skipIfUnchanged": true   // default false, fully backward-compatible
}
```

Response (`SnapshotInfo`) gains:

```jsonc
{
  "templateID": "...",
  "buildID":    "...",
  "unchanged":  true        // when true, templateID/buildID point at the previous snapshot
}
```

### Architecture

Three slices, each gated behind a LaunchDarkly flag (`inspector_skip_unchanged`) and an envd version bump.

#### 1. Inspector (in-guest, in `envd`)

- **Filesystem net-change tracking.** eBPF attached to guest raw syscall tracepoints (`sys_enter_openat` w/ `O_CREAT|O_TRUNC|O_WRONLY`, `unlinkat`, `renameat2`, `write/pwrite/writev`, `truncate`, `mkdirat`, `linkat`, `symlinkat`). Per-inode "last operation" entries in a BPF hash map; userspace daemon computes the *net* change set per Crab §5.2 (a write-then-unlink within the same epoch ⇒ no change).
- **Process / memory net-change tracking.** Use the `/sys/fs/cgroup/user` cgroup envd already creates for spawned processes (`packages/envd/internal/services/process/handler/handler.go:123`) to enumerate user processes. For each survivor, read soft-dirty bits from `/proc/PID/pagemap`; reset via `/proc/PID/clear_refs` after a checkpoint completes. Track cgroup-level births / deaths to catch new long-lived processes.
- **Exclude envd itself.** envd is the in-guest equivalent of Crab's "agent process" (§6 *agent-in-a-sandbox*); skip its PID + handler children, and its scratch paths.
- **Epoch reset is checkpoint-driven, not turn-driven.** envd doesn't know about turns. The orchestrator tells envd "I just published checkpoint X, reset" via `ResetEpoch`.
- **Graceful degradation.** If the guest kernel lacks BTF, kprobes, or soft-dirty (we'll confirm against the actual e2b kernel config), Inspector enters a *degraded* mode where `QueryChanges` always returns `true` — falling through to a full checkpoint. Correctness is preserved at every step.

New Connect-RPC service in `packages/envd/spec/inspector/inspector.proto`, mounted using the existing pattern at `packages/envd/main.go:176-187`:

```proto
service Inspector {
  rpc QueryChanges(QueryChangesRequest) returns (QueryChangesResponse);
  rpc ResetEpoch(ResetEpochRequest)     returns (ResetEpochResponse);
}

message QueryChangesResponse {
  bool   filesystem_changed = 1;
  bool   processes_changed  = 2;
  uint32 epoch_id           = 3;
}
```

For the MVP we only consume `filesystem_changed || processes_changed`. Granular booleans are left on the wire so a future "partial checkpoint" phase can use them without a wire break.

#### 2. Orchestrator short-circuit

- Add `bool skip_if_unchanged` to `SandboxCheckpointRequest` and `bool unchanged` to `SandboxCheckpointResponse` (`packages/orchestrator/orchestrator.proto:163`).
- In `Server.Checkpoint` (`packages/orchestrator/pkg/server/sandboxes.go:552`), branch *before* `MarkStopping` and `snapshotAndCacheSandbox`:
  - If flag set + feature flag on + envd version supports Inspector, call `Inspector.QueryChanges`.
  - Both booleans `false` ⇒ return `Unchanged: true, BuildId: <last published snapshot's build id>` immediately.
  - Otherwise, fall through to the current path; on success, call `Inspector.ResetEpoch`.
- Capability gate: add `utils.CheckEnvdVersionForInspector` next to the existing `CheckEnvdVersionForSnapshot` (`packages/orchestrator/pkg/server/sandboxes.go:571`).
- Track "latest published snapshot per sandbox" in the in-memory `sandboxFactory.Sandboxes` map. Best-effort, no Redis dependency for v1; orchestrator restart simply forces the next call to a full checkpoint.

#### 3. API plumbing

- Update the request/response schemas in `spec/openapi.yml:2483-2517` and re-run `make generate/api`.
- `packages/api/internal/handlers/snapshot_template_create.go` forwards the flag into `SnapshotTemplateOpts`; propagates `unchanged` back.
- `packages/api/internal/orchestrator/snapshot_template.go:88` passes the flag on the gRPC call. When `unchanged=true`, skip the template-upsert work — the previous snapshot is already a real template — and return its ID/build directly.

### Rollout

1. **Phase 1.** Inspector + skip path behind `inspector_skip_unchanged`, off by default. Internal benchmarks on a `read_file`-heavy workload to measure the actual skip rate on e2b (Crab reports 87% on Claude-code/Terminal-Bench; ours will differ).
2. **Phase 2.** Default-on for opted-in templates after we observe meaningful skip rates with no recovery regressions.
3. **Phase 3 (separate RFC).** Partial checkpoints, versioned manifest, host-scoped scheduler. Big surgery in storage + upload paths; not scoped here.

### Risks and open questions

1. **Guest kernel capabilities.** Need to verify that e2b's microVM kernel ships with `CONFIG_DEBUG_INFO_BTF`, `CONFIG_KPROBES`, `CONFIG_MEM_SOFT_DIRTY`, and `CONFIG_BPF_SYSCALL`. If not, either enable in the next kernel rebuild or fall back to inotify+pidfd (less precise; loses net-change semantics).
2. **False negatives.** A bug in the eBPF program could miss a real change and cause incorrect recovery. Mitigation: the Crab paper reports zero false negatives in 2,063 manually labelled turns, but we should add an integration test that randomly invalidates the Inspector's response and asserts checkpoints still happen on timeouts.
3. **False positives.** Conservative tracking (e.g. write-then-delete in the same epoch counted as changed) is acceptable: it only causes an extra full checkpoint, which is the current behavior anyway.
4. **envd version skew.** Old templates that haven't been rebuilt with the new envd will silently take the existing always-pause path. New SDK clients passing `skipIfUnchanged=true` against old templates get a normal full checkpoint and `unchanged=false`. No breakage.
5. **Storage / latest-snapshot pointer.** In-memory only for v1 — an orchestrator restart between two skip calls forces the next one to full. If customers complain, promote to Redis.
6. **Sandbox-internal network namespaces / containers.** If a user runs Docker-in-sandbox or unshares mount/pid namespaces, eBPF in envd's namespace may miss writes inside nested namespaces. Document as a limitation; users with that pattern keep getting full snapshots.

### What this is *not* claiming

- We are not claiming feature parity with Crab. The paper's headline number (87% skip, 1.9% overhead) was measured on runc+CRIU+ZFS at 96-sandbox density. Our skip rate and savings will be different because Firecracker pause cost, kernel visibility, and per-snapshot overhead are all different.
- We are not changing the semantics of any existing endpoint. The new flag is opt-in; default behavior is unchanged.

### Pointers / prior art

- Paper: arXiv [2604.28138](https://arxiv.org/abs/2604.28138).
- Related host-side observability for agent sandboxes: AgentSight, arXiv [2508.02736](https://arxiv.org/abs/2508.02736).
- Kata + in-guest eBPF for similar reasons: https://katacontainers.io/blog/kata-containers-ant-container-security-with-ebpf-whitepaper/
- KVM dirty ring (relevant for a future Firecracker-side enhancement): [LWN article](https://lwn.net/Articles/805880/), [KVM API](https://docs.kernel.org/virt/kvm/api.html).

### Asks

- Maintainer feedback on the API shape (keep it on the existing `/snapshots` endpoint vs. add a dedicated `/checkpoint-fast`).
- Confirmation of guest kernel build options (BTF, soft-dirty, kprobes).
- Greenlight on the envd version-bump pattern + LaunchDarkly flag name.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: skip unchanged checkpoints via in-guest eBPF inspector #2580

Summary

Background: why the paper's design doesn't drop in

Goal

Non-goals

API surface (no new endpoint)

Architecture

1. Inspector (in-guest, in `envd`)

2. Orchestrator short-circuit

3. API plumbing

Rollout

Risks and open questions

What this is not claiming

Pointers / prior art

Asks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: skip unchanged checkpoints via in-guest eBPF inspector #2580

Description

Summary

Background: why the paper's design doesn't drop in

Goal

Non-goals

API surface (no new endpoint)

Architecture

1. Inspector (in-guest, in envd)

2. Orchestrator short-circuit

3. API plumbing

Rollout

Risks and open questions

What this is not claiming

Pointers / prior art

Asks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Inspector (in-guest, in `envd`)