Skip to content

Move orphan-network reaper to daemon process #341

@dpup

Description

@dpup

Background

run.Manager.cleanOrphanNetworks (added in the fix for #315) runs at NewManagerWithOptions time and removes moat-managed container networks whose run dirs no longer exist. It's gated behind ManagerOptions.ReapOrphanNetworks so it only fires from moat run and moat clean paths — read-only commands skip it to avoid the per-invocation cost.

Why move it

The proxy daemon is a more natural home for the reaper:

  • The daemon already outlives individual CLI invocations (auto-shuts down after 5 minutes idle).
  • Sweep cost happens once per daemon lifetime instead of once per moat run.
  • Daemon already tracks active runs via run-token registration — it has authoritative knowledge of which networks are alive without scanning ~/.moat/runs/ from disk.
  • A periodic background sweep (e.g., every 10 minutes) would catch leaks from hard process kills (go test -timeout, SIGKILL) without waiting for the next user moat run.
  • CLI startup latency would no longer scale with orphan count on Apple containers.

Proposed work

  1. Add a periodic reaper goroutine to the daemon (internal/daemon/).
  2. Cross-reference moat-managed networks against the daemon's in-memory run registry (which is more authoritative than disk run dirs) plus disk run dirs as fallback for runs not yet registered.
  3. Remove the ReapOrphanNetworks plumbing in run.ManagerOptions once the daemon owns this fully.
  4. Optionally expose moat doctor or moat cleanup networks as a user-facing escape hatch for forced reaping.

Architectural rationale

From the architecture review of the #315 fix:

Orphan cleanup is a daemon responsibility, not a per-CLI-instance responsibility. The CLI-launched daemon at startup, and periodically (e.g., every N minutes), is a better fit than every `moat run`/`moat list`.

Out of scope

The bounded networkCreateTimeout in internal/container/runtime.go should remain regardless — it's defense-in-depth for when the runtime itself is unresponsive, independent of how/when reaping happens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions