Lifecycle-aware cleanup of agent backend temp dirs (/tmp/claude-{uid}) to prevent /tmp quota exhaustion

## Problem

Claude Code overrides `TMPDIR` to `/tmp/claude-{uid}` and **never cleans it up**. On Linux hosts where `/tmp` is mounted with a per-user quota (`usrquota`), this directory accumulates indefinitely — Claude's scratch/shell-snapshot/sandbox-setup files, plus any temp files the agent's commands create. Once the user's `/tmp` quota is exhausted, **every** Claude `bash` call fails silently (exit 1, empty stdout/stderr) because Claude can't write its temp/sandbox-setup files before exec. The agent becomes completely non-functional.

### Observed (VPS, Ubuntu 26.04)
- `/tmp` mounted `tmpfs … usrquota`, ~1 GB per-user quota.
- `/tmp/claude-1000` grew to **909 MB / 63,370 inodes** over ~10 days → quota exhausted.
- Symptom: every bash command (even `true`) → **exit 1, no output, no error surfaced**; the `Read` tool, plain `/bin/bash`, network, and Codex all worked fine.
- **Extremely hard to diagnose** — it masquerades as a bwrap/AppArmor/user-namespace sandbox failure. It is NOT: `bwrap` smoke tests pass, userns works, the sandbox is never even reached. The real signal was a `Disk quota exceeded` on a `/tmp` write. Clearing `/tmp/claude-1000` immediately restored `bash` → exit 0.
- macOS unaffected (no `/tmp` quota; different temp layout).

## Why this belongs in QuadWork

A host-level `cron`/`systemd-tmpfiles` purge fixes one machine, but **every QuadWork install on a Linux host with a `/tmp` quota is exposed** and will hit a total, silent agent outage that's near-impossible to diagnose. QuadWork owns the agent lifecycle, so it's the right layer to keep backend temp bounded — and it can clean at known-safe moments (agent teardown) rather than blindly.

## Proposed scope

1. **On agent teardown** (reset / full reset / restart / stop): purge the backend's temp dir for that uid, stale entries only:
   - Claude: `/tmp/claude-{uid}`
   - Codex / Gemini: confirm + cover their equivalent temp dirs.
2. **Periodic sweep** from the server process (e.g. hourly): remove `/tmp/claude-*` (+ codex/gemini equivalents) entries older than a conservative age (**48–72h**), age-based on atime so currently-active session files are spared.
3. Make enable/age **configurable** with safe defaults.

## Acceptance criteria

- [ ] Backend temp dirs stay bounded well under typical `/tmp` quotas during multi-day continuous operation.
- [ ] No active session's temp is deleted mid-use (conservative age + atime-based selection).
- [ ] Covers `claude` + `codex` (+ `gemini`) backends.
- [ ] Cross-platform safe — Linux primary; macOS/Windows = harmless no-op.
- [ ] Cleanup runs both on agent teardown AND on a periodic timer.
- [ ] Documented (`docs/troubleshooting.md` + a note in `docs/install-vps.md`).

## Notes

- **Immediate host-level mitigation** for anyone hitting this now: `systemd-tmpfiles` drop-in or cron purging `/tmp/claude-*` older than ~72h, and/or raise the `/tmp` per-user quota.
- Likely also an **upstream Claude Code bug** (it should bound its own `TMPDIR`) — worth a separate upstream report.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lifecycle-aware cleanup of agent backend temp dirs (/tmp/claude-{uid}) to prevent /tmp quota exhaustion #957

Problem

Observed (VPS, Ubuntu 26.04)

Why this belongs in QuadWork

Proposed scope

Acceptance criteria

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Lifecycle-aware cleanup of agent backend temp dirs (/tmp/claude-{uid}) to prevent /tmp quota exhaustion #957

Description

Problem

Observed (VPS, Ubuntu 26.04)

Why this belongs in QuadWork

Proposed scope

Acceptance criteria

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions