Skip to content

Add hosted worker container isolation and resource limits #5

@romgenie

Description

@romgenie

Local source: coven-github/issues/04-add-hosted-worker-container-isolation.md

Summary

Hosted coven-github should run each agent task in a fresh isolated worker environment with CPU, memory, disk, network, and wall-clock controls. The current local development worker runs coven-code --headless directly on the host with a per-task workspace.

Current Evidence

  • crates/worker/src/lib.rs creates a task workspace under the configured workspace root and runs coven-code.
  • docs/container-isolation.md says self-hosted development can run directly on the host, but hosted OpenCoven should not.
  • docs/container-isolation.md defines the production target: one repository task per container, temporary filesystem, scoped installation token, CPU/memory/disk/wall-clock limits, network egress policy, and workspace deletion.
  • docs/security.md says production hosted workers should run each task in an isolated container or sandbox.

Problem

Agent tasks clone private repositories and may execute project commands. A hosted service cannot safely run arbitrary repository workloads directly on a shared host process without strong isolation and resource controls.

Impact

  • A malicious or compromised repository task can access neighboring workspaces.
  • Build/test commands can consume unbounded CPU, memory, disk, or network.
  • Secrets in host environment or filesystem can leak into agent tools.
  • Workspace cleanup failures can leave private code on disk.
  • Hosted reliability and pricing cannot be controlled without resource accounting.

Proposed Design

Add a worker backend abstraction:

  • LocalWorkerBackend for self-hosted/dev mode.
  • ContainerWorkerBackend for hosted mode.

The hosted backend should:

  • create one fresh container or sandbox per task attempt;
  • mount only the task workspace and explicit input artifacts;
  • inject git auth through a controlled channel;
  • enforce CPU, memory, disk, network, and timeout limits;
  • stream filtered progress events back to the task store;
  • copy out only the result envelope and approved logs;
  • destroy the workspace/container after completion or failure.

Acceptance Criteria

  • Hosted mode cannot use the direct host execution backend unless explicitly disabled by operator config.
  • Each task attempt runs in a fresh isolated workspace/container.
  • Resource limits are enforced and visible in task failure states.
  • Workspace cleanup is tested for success and failure paths.
  • Logs copied out of the worker are filtered for secrets and token patterns.
  • Documentation includes minimum supported isolation backends and operator configuration.

Test Notes

Add backend interface unit tests and integration tests using a minimal container runtime or fake backend. Simulate timeout, disk exhaustion, process failure, and cleanup failure. Verify no raw workspace contents are persisted unless explicitly captured as approved artifacts.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions