Skip to content

Lifecycle-aware cleanup of agent backend temp dirs (/tmp/claude-{uid}) to prevent /tmp quota exhaustion #957

Description

@realproject7

Problem

Claude Code overrides TMPDIR to /tmp/claude-{uid} and never cleans it up. On Linux hosts where /tmp is mounted with a per-user quota (usrquota), this directory accumulates indefinitely — Claude's scratch/shell-snapshot/sandbox-setup files, plus any temp files the agent's commands create. Once the user's /tmp quota is exhausted, every Claude bash call fails silently (exit 1, empty stdout/stderr) because Claude can't write its temp/sandbox-setup files before exec. The agent becomes completely non-functional.

Observed (VPS, Ubuntu 26.04)

  • /tmp mounted tmpfs … usrquota, ~1 GB per-user quota.
  • /tmp/claude-1000 grew to 909 MB / 63,370 inodes over ~10 days → quota exhausted.
  • Symptom: every bash command (even true) → exit 1, no output, no error surfaced; the Read tool, plain /bin/bash, network, and Codex all worked fine.
  • Extremely hard to diagnose — it masquerades as a bwrap/AppArmor/user-namespace sandbox failure. It is NOT: bwrap smoke tests pass, userns works, the sandbox is never even reached. The real signal was a Disk quota exceeded on a /tmp write. Clearing /tmp/claude-1000 immediately restored bash → exit 0.
  • macOS unaffected (no /tmp quota; different temp layout).

Why this belongs in QuadWork

A host-level cron/systemd-tmpfiles purge fixes one machine, but every QuadWork install on a Linux host with a /tmp quota is exposed and will hit a total, silent agent outage that's near-impossible to diagnose. QuadWork owns the agent lifecycle, so it's the right layer to keep backend temp bounded — and it can clean at known-safe moments (agent teardown) rather than blindly.

Proposed scope

  1. On agent teardown (reset / full reset / restart / stop): purge the backend's temp dir for that uid, stale entries only:
    • Claude: /tmp/claude-{uid}
    • Codex / Gemini: confirm + cover their equivalent temp dirs.
  2. Periodic sweep from the server process (e.g. hourly): remove /tmp/claude-* (+ codex/gemini equivalents) entries older than a conservative age (48–72h), age-based on atime so currently-active session files are spared.
  3. Make enable/age configurable with safe defaults.

Acceptance criteria

  • Backend temp dirs stay bounded well under typical /tmp quotas during multi-day continuous operation.
  • No active session's temp is deleted mid-use (conservative age + atime-based selection).
  • Covers claude + codex (+ gemini) backends.
  • Cross-platform safe — Linux primary; macOS/Windows = harmless no-op.
  • Cleanup runs both on agent teardown AND on a periodic timer.
  • Documented (docs/troubleshooting.md + a note in docs/install-vps.md).

Notes

  • Immediate host-level mitigation for anyone hitting this now: systemd-tmpfiles drop-in or cron purging /tmp/claude-* older than ~72h, and/or raise the /tmp per-user quota.
  • Likely also an upstream Claude Code bug (it should bound its own TMPDIR) — worth a separate upstream report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent/devAssigned to Dev agentbugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions