diff --git a/M2.7_HANDOFF.md b/M2.7_HANDOFF.md deleted file mode 100644 index b99f46b..0000000 --- a/M2.7_HANDOFF.md +++ /dev/null @@ -1,144 +0,0 @@ -# M2.7 Egress Proxy — Handoff / Current State - -**As of 2026-06-09.** Read this first, then `ROADMAP_1.0.md` (the full plan) and -`SPECIFICATION.md` → "Egress Proxy (M2.7)" (the implemented design). This file is -the *current state + next steps + gotchas*; the roadmap is the *plan*. - -## TL;DR - -The egress proxy is **built, working, and live-validated on both macOS and -Linux**. Branch `m2.7-proxy` is 22 commits ahead of `main`. The immediate next -action is the **squash-merge to `main`**, then the 7-day soak. Codex is the only -red agent and it's a **pre-existing, separate** issue (not the proxy). - -## What M2.7 is - -Closes finding F2 (macOS Docker Desktop has no LAN isolation; iptables can't be -applied from macOS). The agent runs on a Docker `--internal` "sidecar" network -with no route off it except through a dual-homed `sandy-proxy` sidecar (a -golang→scratch Go binary in `proxy/`). Works identically on Linux and macOS -because it relies on `--internal` routing, not iptables. Tri-state -`SANDY_EGRESS_PROXY`: `0` off (legacy iptables on Linux), `1` permissive (block -private/LAN/metadata, allow internet — **now the default**), `2` strict -(allowlist only). - -## Validated (with evidence) - -- **macOS, real Docker Desktop:** `SANDY_EGRESS_PROXY=1` → `api.github.com/zen` - reachable, `http://192.168.1.1` blocked instantly (F2 closed), Claude Code runs - through it. The `SANDY_DEBUG_PROXY=1` probe showed `getent api.anthropic.com → - ` and `curl … http_code=404 remote_ip=` (real API reached - through the proxy). -- **Linux (aarch64 DGX Spark):** same probe → proxy works (Claude reached the API - through it; LAN blocked). -- **Unit suite** (`test/run-tests.sh`): green on macOS, including the section-50 - launcher tests and `proxy/` Go tests. - -## Immediate next step: squash-merge to `main` - -The proxy is done and validated. Merging: -1. Resolves the `SANDY_PROXY_REF` caveat — pre-merge, an *installed* `-dev` copy - resolves the proxy-image ref to `main` (no `proxy/` yet) and the default-on - proxy build fails. After merge, `main` has `proxy/` and everything works with - **no** `SANDY_PROXY_REF`. (A repo checkout already works pre-merge via the - branch-aware ref in `_sandy_proxy_ref`.) -2. Sets up the soak. PR 2.7.6 plan: 7-day soak with `SANDY_EGRESS_PROXY=1` as the - default-on candidate. The default flip already landed (commit `b23e7f3`). - -The PR description should summarize: tri-state design, the `--internal` -two-network topology, the default flip + branch-aware ref, iptables kept as the -`=0` fallback (POST_1.0 retire plan), and the live-validation bug hunt below. - -## Bugs found during live bring-up (all fixed — do NOT reintroduce) - -Static tests and the topology spike could not catch these; only running it on -real Docker did. Each is a committed fix with a regression test: -- **`OS` unbound** before the network setup → defined `OS` early (`99ba6e0`). -- **Proxy booted on the `--internal` sidecar** → no default route, upstream dials - failed (ECONNRESET). Boot on egress, attach sidecar with `--ip` (`262bdea`). -- **Networks created before the cleanup trap** → leaked on any failure, exhausted - Docker's address pool. Create after the trap is armed (`48328e4`). -- **Transparent path sent the ClientHello twice** (`up.Write(prefix)` + splice of - the un-consumed `bufio.Peek` buffer) → TLS "protocol version" alert. Drop the - explicit write (`747262c`). Also fixed `extractHTTPHost` premature-truncation. -- **Headless allocated a TTY** → gemini (Ink/React) busy-looped. And **headless - attached an interactive stdin** → `codex exec` blocked reading it. Final rule - (`35306a8`): interactive `-it`; headless + terminal-stdin → no `-i`/`-t`; - headless + piped-stdin → `-i`. Integration harness feeds ` Network egress is one of sandy's isolation layers. For the full picture — > the assumed adversary, every layer, and the honest residual risks — see -> [`THREAT_MODEL.md`](THREAT_MODEL.md). Empirical bypass attempts are in -> [`ISOLATION_STRESS.md`](ISOLATION_STRESS.md). +> [`THREAT_MODEL.md`](docs/security/THREAT_MODEL.md). Empirical bypass attempts are in +> [`ISOLATION_STRESS.md`](docs/security/ISOLATION_STRESS.md). ### Egress proxy (`SANDY_EGRESS_PROXY`) — cross-platform isolation diff --git a/TECH_DEBT_REVIEW_FINDINGS.md b/TECH_DEBT_REVIEW_FINDINGS.md deleted file mode 100644 index dd27720..0000000 --- a/TECH_DEBT_REVIEW_FINDINGS.md +++ /dev/null @@ -1,124 +0,0 @@ -# Tech Debt Review: Session Migration (v0.7.10) - -Review date: 2026-03-26 -Scope: Migration code (sandy lines 565-630), auto-resume logic (lines 815-825), tests (test/run-tests.sh section 27) - -This review covers tech debt accumulated during iterative development of the session migration feature across v0.7.10. - ---- - -## RESOLVED - -### ~~1. `.claude.json` write is not atomic~~ — FIXED - -Atomic write via tmp + rename pattern now in place (lines 623-626). - -### ~~2. `.claude.json` migration has no test coverage~~ — FIXED - -Added tests 6-9 in section 27: trust consolidation, no-op, malformed JSON, trailing newline. - -### ~~3. Missing trailing newline on `.claude.json` write~~ — FIXED - -Now writes `JSON.stringify(d, null, 2) + "\n"` (line 625). - -### ~~5. Silent error swallowing masks migration failures~~ — FIXED - -All three migration steps now emit yellow warnings on failure (lines 581, 589, 628) instead of silently continuing. - -### ~~10. Comment on `cp -an` doesn't explain no-clobber behavior~~ — FIXED - -Comment now explains both `-a` (archive) and `-n` (no-clobber) semantics (lines 576-577). - ---- - -## OPEN — MEDIUM Severity - -### 4. `sed` replacement on `history.jsonl` doesn't escape `$WORKSPACE` - -**Line:** 588 - -```bash -sed -i "s|\"project\":\"[^\"]*\"|\"project\":\"$WORKSPACE\"|g" "$HOME/.claude/history.jsonl" -``` - -If `$WORKSPACE` contains `&` (sed replacement metacharacter) or `\`, the replacement will be corrupted. In practice, `$WORKSPACE` is always a container path like `/home/claude/dev/...` (constructed at sandy line 1506), so special characters are extremely unlikely. But the code is fragile by construction. - -**Fix:** Escape the replacement string: -```bash -_ws_escaped="$(printf '%s\n' "$WORKSPACE" | sed 's/[&/\]/\\&/g')" -sed -i "s|\"project\":\"[^\"]*\"|\"project\":\"$_ws_escaped\"|g" ... -``` - -### 6. Migration merges ALL `.claude.json` project entries, including unrelated ones - -**Lines:** 605, 619 - -The `.claude.json` is seeded from the host's `~/.claude.json` on first sandbox creation (line 1360). The host file may contain entries for unrelated projects (e.g., `/Users/rappdw/dev/mws`). The migration deletes ALL entries except the current workspace, merging their `allowedTools` and trust state into the current entry. - -**Impact:** Minimal in practice — extra `allowedTools` are harmless in a sandboxed environment, and inheriting `hasTrustDialogAccepted: true` from an unrelated project is benign (the sandbox IS trusted). But it conflates stats like `lastCost` and `lastSessionId` from unrelated projects. - -**Possible fix:** Only migrate entries whose paths match known era patterns for the current project. This would require reverse-mapping the sandbox to its host project path, which adds complexity. May not be worth it given the low impact. - -### 7. `_sessions_migrated` flag has misleading semantics - -**Lines:** 570, 579, 820 - -The flag is set when `cp -an` succeeds (project dirs merged) and used to choose `--resume` vs `--continue`. But it doesn't reflect whether `history.jsonl` or `.claude.json` migration succeeded. The flag name suggests "all session state was migrated" when it only means "some files were copied." - -**Fix:** Rename to `_project_dirs_merged` for clarity, or set the flag based on a broader condition. - ---- - -## OPEN — LOW Severity - -### 8. Underscore-prefixed variables pollute global scope in heredoc - -**Lines:** 569-583 - -Variables `_cur_proj`, `_sessions_migrated`, `_old_proj` are set in the heredoc's top-level scope (not inside a function). Since `user-setup.sh` runs as a script (not sourced), this is harmless — the variables die with the process. Not a real issue unless `user-setup.sh` execution model changes. - -### 9. `ls` glob for session detection - -**Line:** 819 - -```bash -if ls "$SESSION_DIR"*.jsonl &>/dev/null; then -``` - -Using `ls` for existence testing is discouraged (see ShellCheck SC2012). A more robust alternative: - -```bash -if compgen -G "${SESSION_DIR}*.jsonl" >/dev/null 2>&1; then -``` - ---- - -## Documentation Gap - -### 11. CLAUDE.md has no section on session migration or auto-resume - -The CLAUDE.md documents workspace mount paths and per-project sandboxes but never explains: -- How Claude Code session state is structured (project dirs, `history.jsonl`, `.claude.json` project entries) -- The three path eras and why migration exists -- The auto-resume/auto-continue behavior (`SANDY_AUTO_CONTINUE`) -- What `--resume` vs `--continue` means for the user after migration - -This context is critical for anyone debugging session issues in the future. - ---- - -## Summary - -| # | Issue | Severity | Status | -|---|-------|----------|--------| -| 1 | `.claude.json` write not atomic | HIGH | FIXED | -| 2 | `.claude.json` migration untested | HIGH | FIXED | -| 3 | Missing trailing newline | HIGH | FIXED | -| 5 | Silent error swallowing | MEDIUM | FIXED | -| 10 | Comment missing `-n` explanation | LOW | FIXED | -| 4 | `sed` doesn't escape `$WORKSPACE` | MEDIUM | Open | -| 6 | Merges unrelated project entries | MEDIUM | Open | -| 7 | `_sessions_migrated` naming | MEDIUM | Open | -| 8 | Underscore vars in global scope | LOW | Open | -| 9 | `ls` for session detection | LOW | Open | -| 11 | CLAUDE.md missing session docs | LOW | Open | diff --git a/TODO.md b/TODO.md deleted file mode 100644 index 5a37890..0000000 --- a/TODO.md +++ /dev/null @@ -1,92 +0,0 @@ -# Sandy TODO - -## Lessons from Anthropic's sandbox-runtime - -Analysis of [sandbox-runtime](https://github.com/anthropic-experimental/sandbox-runtime) (Anthropic's official OS-level sandbox using bubblewrap/Seatbelt) for capabilities sandy should consider. - -### High Value - -- [x] **Mandatory protected files** — Mount shell configs (`.bashrc`, `.zshrc`, etc.), `.git/hooks/`, `.claude/commands/`, `.claude/agents/`, `.vscode/`, `.idea/` as read-only inside the container. Prevents config injection and git hook tampering. *(Implemented: read-only bind mount overlays at container launch.)* - -- [ ] **Domain-based network filtering** — srt uses HTTP + SOCKS5 proxies to filter outbound traffic by domain (allow `github.com`, `npmjs.org`, block everything else). Sandy's iptables approach blocks LAN/private ranges but allows all internet traffic. A proxy layer would give finer control and prevent data exfiltration to arbitrary domains. Could be opt-in via `SANDY_ALLOWED_DOMAINS` or `.sandy/network.conf`. - -- [ ] **Violation monitoring / logging** — srt logs sandbox violations in real-time so users can see what was blocked. Sandy currently blocks silently. Adding visibility (at minimum, logging denied network connections and write attempts to protected files) would help debugging and build trust. Could log to `~/.sandy/sandboxes//violations.log`. - -- [x] **Symlink protection** — Scans workspace for symlinks that escape the project tree before mounting. Prompts user to confirm if dangerous symlinks are found. Skips node_modules, .venv, and .git directories. *(Implemented: interactive prompt at startup.)* - -- [ ] **`.env` file protection** — Mount `.env`, `.env.*`, `.env.local` files read-only inside the container. Claude Code running with `--dangerously-skip-permissions` can currently `cat .env` in the mounted project directory. Gemini CLI masks these by bind-mounting zero-permission files over them. Sandy should at minimum mount them read-only; masking entirely is also an option. Scan workspace up to 3 levels deep (excluding `node_modules/`, `.venv*/`, `.git/`) for files matching `.env*` and add them to the protected files list. - -### Medium Value - -- [ ] **Dynamic config updates** — srt supports `--control-fd` for runtime permission changes without restarting the sandbox process. Sandy's model (one container per session) makes this less critical, but a mechanism to reload config (e.g., re-reading `.sandy/config` on signal) could be useful for long-running sessions. - -- [ ] **MITM proxy support** — srt can route traffic through an inspection proxy with custom CA certs. Useful for enterprises that need traffic visibility or have corporate proxies. Could integrate with the domain-based filtering above. - -- [ ] **Configuration validation** — srt uses Zod schemas for strict config validation. Sandy has no config file yet, but if one is added (e.g., `.sandy/config.json` for network rules, protected paths, resource limits), schema validation would prevent misconfiguration. - -### Lower Priority / Future - -- [ ] **Per-command sandboxing** — srt can sandbox individual commands, not just entire sessions. Sandy isolates the whole session. Per-command isolation would be a significant architectural change but could allow finer-grained permissions. - -- [ ] **macOS native sandbox fallback** — For users without Docker, could use macOS `sandbox-exec` (Seatbelt) as a lighter-weight alternative. This is what srt does natively. Would broaden sandy's audience but is a large effort. - -- [ ] **Web UI / monitoring dashboard** — Community project [sandboxed.sh](https://github.com/Th0rgal/sandboxed.sh) has a browser interface for managing multiple agent sessions. Could be valuable for teams. - -## Community & Discoverability - -- [ ] **Get listed in [awesome-claude-code](https://github.com/hesreallyhim/awesome-claude-code)** — Sandy doesn't appear in any community lists. Would increase visibility. - -
- How to submit & draft issue text - - **Steps:** - 1. Go to https://github.com/hesreallyhim/awesome-claude-code/issues/new - 2. Select the **"Recommend a resource"** issue template (do NOT submit a PR — only the repo owner's Claude submits PRs) - 3. Fill in the template with the details below - - **Suggested section:** Tooling > General (alongside `run-claude-docker`, `viwo-cli`, `TSK`) - - **Resource name:** sandy - - **Resource URL:** https://github.com/rappdw/sandy - - **Description:** - - > **sandy** — an isolated sibling for your coding agents. A single command that runs Claude Code or Gemini CLI (or both, side-by-side) in a fully sandboxed Docker container. - > - > `curl | bash` install, then just run `sandy` from any project directory. No config needed. - > - > **Key features:** - > - **Filesystem isolation**: Read-only root filesystem, non-root user, no-new-privileges - > - **Network isolation**: LAN/private ranges blocked via iptables, internet access preserved - > - **Per-project sandboxes**: Each project gets its own isolated `~/.claude`, credentials, and package storage - > - **Persistent dev environments**: pip, npm, go, cargo, and uv installs survive across sessions per project - > - **Multi-language toolchains**: Python 3, Node.js 22, Go 1.24, Rust stable, C/C++ pre-installed - > - **Protected files**: Shell configs, `.git/hooks/`, `.claude/commands/`, `.claude/agents/` mounted read-only to prevent injection attacks - > - **Per-project Dockerfile**: Drop a `.sandy/Dockerfile` in your project to layer custom tools (e.g., quarto, typst) on top - > - **SSH agent relay**: Token-based (default) or socket-forwarded git authentication - > - **Auto-update**: Detects new Claude Code releases and rebuilds automatically - > - **Git submodule support**: Correctly mounts worktree and gitdir for submodule workspaces - > - > Self-contained bash script (~1,850 lines). Works on Linux and macOS (via Docker Desktop or Colima). - -
- -- [x] **Emphasize the "one command" story** — Projects like [cco](https://github.com/nikvdp/cco) (173 stars) and [ClaudeCage](https://github.com/PACHAKUTlQ/ClaudeCage) (134 stars) are popular partly because they're drop-in `claude` replacements. Sandy has a similar UX (`curl | bash` install + `sandy` command) but could market this more prominently. *(Done: README now leads with three-line install-and-run.)* - -- [x] **Document Docker Desktop alternatives for macOS** — A significant user segment wants sandboxing on macOS without Docker Desktop. Sandy just needs a Docker-compatible CLI — document alternatives like Rancher Desktop, Colima, and Lima that provide this without a Docker Desktop license. *(Done: README Prerequisites section lists Rancher Desktop, Docker Desktop, Colima, and Lima.)* - -## Plugin Marketplaces - -Sandy currently seeds two plugin marketplaces via `extraKnownMarketplaces` in settings.json: `claude-plugins-official` (Anthropic) and `sandy-plugins` (rappdw). Consider seeding additional community marketplaces to give users a richer plugin catalog out of the box. - -- [ ] **Evaluate and seed community plugin marketplaces** — Candidates to investigate (verify repos exist, have valid `marketplace.json`, and are actively maintained before adding): - - | Marketplace | Focus | Why consider | - |---|---|---| - | `claudebase/marketplace` | Full-stack dev + security (SAST, dependency scanning) | Aligns with sandy's security-conscious audience | - | `ahmedasmar/devops-claude-skills` | Terraform, K8s, CI/CD, GitOps, AWS | Natural fit for devs running containerized workflows | - | `alirezarezvani/claude-skills` | 190+ skills across 9 domains | Broadest coverage, actively maintained | - | `kivilaid/plugin-marketplace` | 100+ plugins, code review/testing/deployment | Good breadth | - - **Note:** `obra/superpowers-marketplace` has been merged into the official Anthropic marketplace — no need to add separately. Some of the above may also migrate to official over time; check before adding. diff --git a/POST_1.0_IDEAS.md b/docs/POST_1.0_IDEAS.md similarity index 91% rename from POST_1.0_IDEAS.md rename to docs/POST_1.0_IDEAS.md index 0a6cb50..8404874 100644 --- a/POST_1.0_IDEAS.md +++ b/docs/POST_1.0_IDEAS.md @@ -447,3 +447,38 @@ Small *launcher* change (validate + detect + one `--runtime` flag) + the privileged-key metadata + docs. The real work is **compatibility soak**, not code — hence opt-in, Linux-first, and clearly labeled "strong-isolation tier, expect some workloads to need `runc`." + +--- + +## Lessons from Anthropic's sandbox-runtime (srt) — carried over from TODO.md + +**Target: assorted (1.1+).** Consolidated 2026-06-11 from the old root `TODO.md` +(an analysis of [sandbox-runtime](https://github.com/anthropic-experimental/sandbox-runtime)). +One item already **shipped**, the rest are parked here; the deeper analysis lives +under `research/`. + +- **Domain-based network filtering — ✅ SHIPPED** as the M2.7 egress proxy + (`SANDY_EGRESS_PROXY` permissive/strict + `SANDY_ALLOW_HOSTS`). This was the + headline srt lesson; it's done. +- **`.env` / secret-file protection (highest-value remaining).** `.env`, + `.env.*`, `.env.local` are **not** in the protected-paths list, so a + prompt-injected agent can `cat` a project's secrets and (in permissive mode) + exfiltrate them. srt and Gemini CLI both address this — Gemini bind-mounts + zero-permission files over them (masking). Fix: scan the workspace ≤3 levels + (excluding `node_modules/`, `.venv*/`, `.git/`) for `.env*` and add them to the + protected list — read-only at minimum, **masked** ideally (reading is the real + risk, and RO only stops writes). See also `docs/security/THREAT_MODEL.md` R2. +- **Violation logging.** sandy blocks silently; srt logs blocked connections / + write attempts in real time. At minimum, log denied egress (the proxy already + has a deny log behind `SANDY_DEBUG_PROXY`) and protected-path write attempts to + `~/.sandy/sandboxes//violations.log` for debuggability + trust. +- **macOS native sandbox fallback.** For users without Docker, `sandbox-exec` + (Seatbelt) as a lighter-weight alternative — broadens reach, large effort. +- **Per-command sandboxing.** srt can sandbox individual commands, not just whole + sessions. A significant architectural change; finer-grained but heavy. +- **Dynamic config reload** (srt's `--control-fd`) and **MITM/inspection-proxy + support** (corporate CA, traffic visibility — composes with strict mode and the + POST_1.0 host-relay broker) — both lower priority. + +(Dropped from the old TODO as not-isolation/marketing: awesome-claude-code +listing, community plugin marketplaces, a web-UI dashboard.) diff --git a/ROADMAP_1.0.md b/docs/ROADMAP_1.0.md similarity index 100% rename from ROADMAP_1.0.md rename to docs/ROADMAP_1.0.md diff --git a/TESTING_PLAN.md b/docs/TESTING_PLAN.md similarity index 100% rename from TESTING_PLAN.md rename to docs/TESTING_PLAN.md diff --git a/ISOLATION_STRESS.md b/docs/security/ISOLATION_STRESS.md similarity index 100% rename from ISOLATION_STRESS.md rename to docs/security/ISOLATION_STRESS.md diff --git a/ISOLATION_STRESS_LINUX_PROMPT.md b/docs/security/ISOLATION_STRESS_LINUX_PROMPT.md similarity index 100% rename from ISOLATION_STRESS_LINUX_PROMPT.md rename to docs/security/ISOLATION_STRESS_LINUX_PROMPT.md diff --git a/THREAT_MODEL.md b/docs/security/THREAT_MODEL.md similarity index 96% rename from THREAT_MODEL.md rename to docs/security/THREAT_MODEL.md index b93b753..a7e2919 100644 --- a/THREAT_MODEL.md +++ b/docs/security/THREAT_MODEL.md @@ -103,6 +103,7 @@ Judged against the **primary** adversary, these are mostly accepted; against the |---|---|---|---| | R1 | **Shared kernel.** Namespaces are a soft boundary; a kernel/`runc` exploit = host escape (Linux native). | determined | Open. Mitigation: optional **gVisor runtime** (see POST_1.0_IDEAS). macOS has a VM boundary to the Mac. | | R2 | **Prevention vs detection (filesystem).** Absent protected dirs (`.github/workflows/` …) can be *created* and persist to the host, detected only at session end. | wrong-agent / committed-config | Open. Roadmap: **fanotify `FAN_OPEN_PERM`** for true prevention. | +| R2b | **Project secrets in `.env` are readable.** `.env`/`.env.*`/`.env.local` are *not* in the protected-paths list, so a prompt-injected agent can read a project's secrets (and exfiltrate them in permissive egress). | wrong-agent | Open. Roadmap: add `.env*` to protected paths, read-only or **masked** (see `POST_1.0_IDEAS.md`). | | R3 | **Exfil not fully closed.** Permissive reaches any public host; strict's *allowlisted* hosts (GitHub gist, `npm publish`) are exfil channels. The agent holds its own session creds. | determined | Partial. Roadmap: **strict-mode host-relay broker** (POST_1.0). | | R4 | **Workspace supply-chain.** The agent can poison the project (`package.json` deps, build scripts, `Makefile`) that the *user* later runs on the host. | wrong-agent / determined | Semi-fundamental — the workspace must be writable. Mitigated only for auto-executing dotfiles. | | R5 | **Within-sandbox persistence.** RW persistent mounts (`cargo`/`npm`/`pip` bins on PATH, `.claude` plugins/commands) let a malicious tool fire in a later session for that project. | determined | Open; bounded to the sandbox (not the host). | diff --git a/refactor-prompt.md b/refactor-prompt.md deleted file mode 100644 index ebf156b..0000000 --- a/refactor-prompt.md +++ /dev/null @@ -1,118 +0,0 @@ -# Sandy Refactor: Combined Analysis Prompt - -Use this as a one-shot prompt with sandy or Claude Code. It's designed to spawn parallel subagents for the compatibility audit and structural cleanup, then synthesize findings into a single prioritized action plan. - ---- - -## The Prompt - -``` -I want to do a thorough refactor and cleanup pass on the sandy codebase. Spawn three subagents to work in parallel, then synthesize their findings into a single prioritized action plan saved to analysis/refactor-plan.md. - -IMPORTANT CONTEXT: The analysis/ directory contains a prior security/architecture audit and TODO.md has a roadmap from that work. Each agent should read these first to avoid duplicating prior findings. Focus on what's NEW or was MISSED. - ---- - -### Agent 1: Claude Code Compatibility Audit - -Research what has changed in Claude Code from late 2025 through March 2026, then compare against what sandy assumes. Check each of these specifically: - -**Credentials & OAuth:** -- Sandy seeds .credentials.json into the container (line ~948) and mounts .claude.json (line ~950). Recent Claude Code versions require BOTH files for a session to work. Verify .claude.json schema — sandy seeds `tipsDisabled` and `installMethod` (lines 836-844). Are there new required fields? Has the OAuth session state structure changed? -- Is the new CLAUDE_CODE_OAUTH_TOKEN env var something sandy should support or explicitly block? - -**Environment variables (entrypoint, lines 290-293):** -- DISABLE_AUTOUPDATER=1 — still valid? -- DISABLE_SPINNER_TIPS=1 — still recognized, or renamed/removed? -- CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 — still experimental, graduated, or gone? -- CLAUDE_CODE_MAX_OUTPUT_TOKENS=128000 — sandy hardcodes this. Default is now 64k for Opus 4.6. Is the env var still respected? Any new behavior? -- Any NEW env vars sandy should set? (CLAUDE_CODE_DISABLE_CRON, CLAUDE_CODE_DISABLE_1M_CONTEXT, etc.) - -**Installation method (Dockerfile, line ~196):** -- Sandy installs via `curl -fsSL https://claude.ai/install.sh | bash`. Is this still the recommended path? Has the installed file layout changed from ~/.local/{bin,share}/claude? -- Sandy relocates the binary to /usr/local/bin/claude and data to /opt/claude-code. Still valid? -- The version check hits `https://storage.googleapis.com/claude-code-dist-86c565f3.../latest` (line ~529). Is this endpoint still active? - -**Permission model & flags:** -- `--dangerously-skip-permissions` (line ~407) — still supported? -- `--teammate-mode tmux` (line ~404) — still the right flag name/value? -- `claude remote-control` (line ~397) — still supported? -- `bypassPermissions` in settings — still honored? Any new sandbox keys? - -**~/.claude directory structure:** -- Are there new subdirectories (debug/, plugins/, session-env/, file-history/) that need persistence or explicit exclusion? -- Session file path: sandy checks `~/.claude/projects/-workspace/*.jsonl` for auto-resume (line ~422). Has session storage changed? - -**Deliverable:** A findings list with three categories: BROKEN (definitely wrong), STALE (probably fine but references deprecated things), CURRENT (verified still correct). Include line numbers and sources. - ---- - -### Agent 2: Structural Cleanup & Dead Code - -Analyze the sandy script (1,183 lines) for internal code quality issues. Read analysis/ and TODO.md first to skip known items. - -**Dead code & unreachable branches:** -- Variables set but never read? Flag parsing options that don't connect to anything? -- The python3 fallback SSH relay (lines ~1108-1132) duplicates socat, which is guaranteed to be on the host (macOS preflight at line ~576 enforces it) AND in the container (base image). Is the python3 fallback dead code? -- The python3 fallback in token_needs_refresh() (lines ~889-901) — node is guaranteed in the base image AND on the host (install.sh checks for it). Dead code? - -**Duplication:** -- `shasum -a 256 2>/dev/null || sha256sum` appears on lines ~506 and ~519. Factor into a helper. -- The .claude.json node -e blocks (lines ~836-844 and ~851-862) are near-duplicates. Merge? - -**Version mismatch:** -- SANDY_VERSION="0.5.0" (line 19) but RELEASE_NOTES.md says v0.6.0. Confirm and flag. - -**Heredoc sprawl:** -- ensure_build_files() is ~360 lines because it contains 5 heredocs. The tmux.conf and entrypoint.sh are written to disk anyway — should they be separate source files? What's the tradeoff? - -**Entrypoint complexity:** -- The entrypoint mixes root-phase (lines ~207-286) and user-phase (lines ~287-434) in one heredoc. The user phase is ~150 lines of single-quoted bash inside `exec gosu ... bash -c '...'`. This is hard to debug, hard to shellcheck, and quoting errors are invisible. Evaluate splitting into two scripts (root-entrypoint.sh + user-setup.sh). - -**Network isolation:** -- The iptables section (lines ~769-824) handles 5 private ranges + container subnet exception. Could `docker network create --internal` plus a selective allowlist be simpler? What would break? - -**Session auto-resume:** -- Lines ~411-425 use `ls *.jsonl` to detect prior sessions. Is globbing reliable here? Edge cases? - -**Deliverable:** Prioritized list in three tiers: (1) likely bugs or correctness issues, (2) simplifications that cut lines or reduce complexity, (3) readability improvements. Line references for everything. - ---- - -### Agent 3: Settings, Config & Hardening Review - -Review sandy's configuration surface and security posture for cruft, missed hardening, and UX issues. - -**Settings generation (.claude.json, lines ~826-863):** -- Sandy generates .claude.json with tipsDisabled and installMethod. It does NOT set bypassPermissions here (that's via the CLI flag). Is there drift between what the CLI flag sets and what the file contains? Could they conflict? -- The node -e JSON manipulation is fragile — no error handling if the JSON is malformed. Evaluate using jq (available? not in base image) or making the node script more defensive. - -**Per-project config (.sandy/config, line ~569):** -- This is `source`'d as raw bash. Any injection risk if the workspace is untrusted? The script already validates SANDY_HOME for metacharacters (line ~23) but not individual config values. -- Are there config keys documented in README/CLAUDE.md that aren't actually read anywhere in the script? - -**Protected files (somewhere in mount setup):** -- Find where protected file mounts are set up. Are there new sensitive paths that should be protected? (e.g., .claude/plugins/, .claude/agents/ — agents/ is listed in CLAUDE.md as protected, verify it's implemented) - -**Resource defaults:** -- SANDY_CPUS defaults to all available (line ~500). SANDY_MEM defaults to available minus 1GB (line ~501). Are these reasonable? Should there be a cap? -- tmpfs sizes: /tmp at 1G (line ~943), /home/claude at 2G (line ~944). The CLAUDE.md mentions 2GB limit. Has usage grown with larger Claude Code installs? - -**Cleanup on exit:** -- The cleanup() trap (lines ~811-823) handles network rules, Docker network, cred tmpdir, and SSH relay PID. Is anything leaked on unclean exit? What about the CRED_TMPDIR if the script is killed with SIGKILL? - -**Deliverable:** Findings list with categories: SECURITY (hardening gaps), CRUFT (stale settings or dead config), UX (improvements for the operator). Include line references. - ---- - -### Synthesis - -After all three agents complete, combine their findings into a single `analysis/refactor-plan.md` with: - -1. **Critical fixes** — things that are broken or will break soon -2. **Quick wins** — low-effort, high-value cleanups (dead code removal, version bump, etc.) -3. **Refactors** — larger structural changes with effort estimates and tradeoffs -4. **Deferred** — things noted but not worth doing now, with rationale - -For each item, include: the finding, which agent surfaced it, affected lines, and a suggested fix or approach. -``` diff --git a/HANDOFF_TO_ALICE.md b/research/HANDOFF_TO_ALICE.md similarity index 100% rename from HANDOFF_TO_ALICE.md rename to research/HANDOFF_TO_ALICE.md diff --git a/HANDOFF_TO_SANDY.md b/research/HANDOFF_TO_SANDY.md similarity index 100% rename from HANDOFF_TO_SANDY.md rename to research/HANDOFF_TO_SANDY.md