Evaluate sandbox isolation options: gVisor runtime, Firecracker microVMs, or cluster-in-VM

## Problem

The Navigator k3s cluster currently runs as a **privileged Docker container** with k3s's embedded containerd using the default `runc` runtime. While sandbox pods already enforce Landlock, seccomp, and network namespace isolation at the process level, the sandbox containers themselves share the host kernel via the standard OCI runtime. Evaluating stronger isolation boundaries would improve the security posture for running untrusted agent code.

The `SandboxTemplate` proto already has a `runtime_class_name` field wired through to the pod spec (`navigator-server/src/sandbox/mod.rs:512`), so partial plumbing for alternative runtimes exists.

## Options to Evaluate

### Option 1: gVisor (`runsc`) as containerd runtime (Recommended starting point)

[gVisor](https://gvisor.dev/docs/) is a userspace application kernel written in Go that intercepts all syscalls and implements them without passing them to the host kernel. It integrates with containerd via a shim and supports Kubernetes `RuntimeClass` natively.

**Pros:**
- Lightweight, fast startup, process-like resource model (no fixed guest resources)
- Battle-tested at scale (powers GKE Sandbox at Google)
- Native Kubernetes `RuntimeClass` integration -- our existing `runtime_class_name` field would work directly
- Simple integration: install `runsc` binary in the cluster image, add containerd runtime config, create a `RuntimeClass` manifest
- Supports amd64 and arm64
- Defense-in-depth: gVisor also applies seccomp internally to its own Sentry process

**Cons:**
- Not full VM-level isolation (the Sentry runs as a userspace process on the host kernel, though with a very restricted syscall surface)
- Some Linux syscall compatibility gaps -- needs testing with our sandbox workloads (Python, Node.js, coding agents)
- Higher per-syscall overhead for syscall-heavy workloads
- May conflict with or be redundant alongside our existing Landlock/seccomp enforcement

**Integration sketch:**
1. Add `runsc` binary to `Dockerfile.cluster`
2. Add containerd runtime config for `runsc` handler
3. Deploy a `RuntimeClass` named `gvisor` in the k3s manifests
4. Set `runtime_class_name: "gvisor"` on sandbox templates (already supported in proto/server)
5. Test sandbox workloads for compatibility

### Option 2: Firecracker microVMs via `firecracker-containerd`

[firecracker-containerd](https://github.com/firecracker-microvm/firecracker-containerd) enables containerd to run containers inside Firecracker microVMs, providing hardware-level isolation via KVM. Each container gets its own lightweight VM (~125ms boot, ~5MB overhead).

**Pros:**
- Strongest isolation model (hardware-level via KVM hypervisor)
- Proven at massive scale (AWS Lambda, Fargate)
- Minimal overhead per microVM

**Cons:**
- **Requires `/dev/kvm`** -- problematic for our Docker-in-Docker setup and essentially impossible on macOS (no nested KVM). Would likely require running the cluster on bare-metal Linux or in a VM with nested virtualization enabled.
- Requires a **custom containerd binary** (firecracker control plugin compiles into containerd), a VM runtime shim, an in-VM agent, and a custom root filesystem image builder -- significantly more complex than gVisor
- Heavier operational burden: must build/maintain a microVM root filesystem image containing runc and the firecracker agent
- Go-only ecosystem; project maintenance cadence should be evaluated
- Does not integrate via standard Kubernetes `RuntimeClass` -- uses its own containerd plugin/API

**Integration sketch:**
1. Replace or heavily modify the k3s embedded containerd with firecracker-containerd
2. Build a microVM root filesystem with runc + firecracker agent
3. Configure CNI networking (tc-redirect-tap plugin)
4. Significant changes to cluster image and entrypoint
5. Would not work for local macOS development without nested virtualization

### Option 3: Run the entire k3s cluster inside a VM

Rather than changing the container runtime, run the entire privileged k3s container inside a lightweight VM (QEMU, Cloud Hypervisor, Lima on macOS, etc.).

**Pros:**
- Isolates the entire cluster, including the privileged Docker container surface
- Defense-in-depth: even a container escape stays within the VM
- No changes to the container runtime or sandbox pod configuration

**Cons:**
- Adds startup latency and resource overhead
- More complex lifecycle management (VM provisioning, networking, volume mounting)
- Complicates local development, especially on macOS (Lima/colima already provides a Linux VM for Docker)
- Does not improve isolation *between* sandbox pods (all still share the same runc runtime inside the VM)

## Recommendation

**Start with gVisor (Option 1)** as it offers the best balance of security improvement, integration simplicity, and compatibility with our existing architecture. It works within our current k3s-in-Docker model, leverages the existing `runtime_class_name` plumbing, and doesn't require KVM or nested virtualization.

Option 3 (cluster-in-VM) could be pursued independently as an additional layer. Option 2 (Firecracker) is the strongest isolation but has the highest integration cost and significant platform constraints.

## Acceptance Criteria

- [ ] Document findings from evaluating each option against our workloads
- [ ] Prototype gVisor integration in the k3s cluster image
- [ ] Test sandbox compatibility (Python 3.12, Node.js, coding agents) under gVisor
- [ ] Measure performance impact (sandbox startup time, steady-state overhead)
- [ ] Decision document with chosen approach and migration plan

---

Originally by @drew on 2026-02-10T13:44:36.529-08:00


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate sandbox isolation options: gVisor runtime, Firecracker microVMs, or cluster-in-VM #4

Problem

Options to Evaluate

Option 1: gVisor (`runsc`) as containerd runtime (Recommended starting point)

Option 2: Firecracker microVMs via `firecracker-containerd`

Option 3: Run the entire k3s cluster inside a VM

Recommendation

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluate sandbox isolation options: gVisor runtime, Firecracker microVMs, or cluster-in-VM #4

Description

Problem

Options to Evaluate

Option 1: gVisor (runsc) as containerd runtime (Recommended starting point)

Option 2: Firecracker microVMs via firecracker-containerd

Option 3: Run the entire k3s cluster inside a VM

Recommendation

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Option 1: gVisor (`runsc`) as containerd runtime (Recommended starting point)

Option 2: Firecracker microVMs via `firecracker-containerd`